Official Repository for "Kahani : Culturally-Nuanced Visual Storytelling Pipeline for Non-Western Cultures".
Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling pipeline called Kahani that generates culturally grounded visual stories for non-Western cultures. Our pipeline leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user's prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of our pipeline, we conducted a comparative user study with ChatGPT-4 (with DALL-E3) in which participants from different regions of India compared the cultural relevance of stories generated by the two tools. Results from the qualitative and quantitative analysis performed on the user study showed that our pipeline was able to capture and incorporate more Culturally Specific Items (CSIs) compared to ChatGPT-4. In terms of both its cultural competence and visual story generation quality, our pipeline outperformed ChatGPT-4 in 27 out of the 36 comparisons.
Follow the given commands to setup and run this project and install necessary packages.
# Build the docker image
$ docker build . -t kahani-streaming
# Set up the environment variables
$ touch .env
$ vi .env
# Paste the below two env variables in the .env file and replace it with your API key and endpoint
# SDAPI_HOST=http://172.17.0.1:7860
# OPENAI_API_KEY=<OPENAI_API_KEY>
# To run the docker container from the built docker image
$ docker run -it -d -p 8080:8080 --env-file .env kahani-streaming
A snapshot of the Kahani Gradio Tool :
Our proposed visual storytelling pipeline consists of five primary steps starting from extracting details from the user story prompt and expanding on these details based on the story's cultural context, to generating cultural visuals for each scene. Prompts for each of the steps are provided in the src/prompts
folder.
.
├── README.md
└── src
├── Dockerfile
├── api.py
├── app.py
├── avatar.png
├── kahani.py
├── llm.py
├── models.py
├── pipeline.yml
├── poetry.lock
├── prompts
│ ├── __init__.py
│ ├── base.py
│ ├── bounding_box
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── break_story_into_scenes
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── classify_change
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── create_story
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── extract_characters
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── extract_culture
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── generate_character
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── generate_pose
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── generate_scenes
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ ├── summarise_culture
│ │ ├── __init__.py
│ │ ├── system.md
│ │ └── user.md
│ └── user_input
│ ├── __init__.py
│ ├── system.md
│ └── user.md
├── pyproject.toml
└── utils.py
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
You can read more about Microsoft's privacy statement here.