Skip to content

Latest commit

 

History

History
142 lines (117 loc) · 6.62 KB

README.md

File metadata and controls

142 lines (117 loc) · 6.62 KB

Kahani : Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures

Abstract

Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling pipeline called Kahani that generates culturally grounded visual stories for non-Western cultures. Our pipeline leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user's prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of our pipeline, we conducted a comparative user study with ChatGPT-4 (with DALL-E3) in which participants from different regions of India compared the cultural relevance of stories generated by the two tools. Results from the qualitative and quantitative analysis performed on the user study showed that our pipeline was able to capture and incorporate more Culturally Specific Items (CSIs) compared to ChatGPT-4. In terms of both its cultural competence and visual story generation quality, our pipeline outperformed ChatGPT-4 in 27 out of the 36 comparisons.

kahani gpt comparison sample

Developer Notes:

Follow the given commands to setup and run this project and install necessary packages.

# Build the docker image
$ docker build . -t kahani-streaming

# Set up the environment variables
$ touch .env
$ vi .env
# Paste the below two env variables in the .env file and replace it with your API key and endpoint
# SDAPI_HOST=http://172.17.0.1:7860
# OPENAI_API_KEY=<OPENAI_API_KEY>

# To run the docker container from the built docker image
$ docker run -it -d -p 8080:8080 --env-file .env kahani-streaming

Gradio Tool Snapshot

A snapshot of the Kahani Gradio Tool :

gradio tool

Kahani Pipeline

kahani pipeline

Our proposed visual storytelling pipeline consists of five primary steps starting from extracting details from the user story prompt and expanding on these details based on the story's cultural context, to generating cultural visuals for each scene. Prompts for each of the steps are provided in the src/prompts folder.

Directory Structure

.
├── README.md
└── src
    ├── Dockerfile
    ├── api.py
    ├── app.py
    ├── avatar.png
    ├── kahani.py
    ├── llm.py
    ├── models.py
    ├── pipeline.yml
    ├── poetry.lock
    ├── prompts
    │   ├── __init__.py
    │   ├── base.py
    │   ├── bounding_box
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── break_story_into_scenes
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── classify_change
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── create_story
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── extract_characters
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── extract_culture
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── generate_character
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── generate_pose
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── generate_scenes
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   ├── summarise_culture
    │   │   ├── __init__.py
    │   │   ├── system.md
    │   │   └── user.md
    │   └── user_input
    │       ├── __init__.py
    │       ├── system.md
    │       └── user.md
    ├── pyproject.toml
    └── utils.py

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Privacy

You can read more about Microsoft's privacy statement here.