{"id":979587,"date":"2023-10-25T21:34:34","date_gmt":"2023-10-26T04:34:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=979587"},"modified":"2024-02-01T10:08:49","modified_gmt":"2024-02-01T18:08:49","slug":"kahani","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/kahani\/","title":{"rendered":"Kahani"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"Project\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Kahani: Visual Storytelling<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

Image generation models like MidJourney, DALL-E 3, and SDXL, have made remarkable progress recently to produce visually stunning images from natural language descriptions. However, most of these models are limited by their lack of cultural awareness and diversity, and often fail to capture the subtle nuances and variations that exist in different cultures and languages.<\/p>\n\n\n\n

To generate an image that matches the user\u2019s expectations and preferences, one has to either provide extensive prompts and edit the output using sophisticated tools like Adobe Photoshop or fine-tune a model using large amounts of data and skills. Both of these approaches are time-consuming, costly, and inaccessible for most people.<\/p>\n\n\n\n

Kahani: Visual Storytelling<\/strong> is a research prototype that allows the user to create visually striking and culturally nuanced images just by describing them in their local languages. Kahani leverages state-of-the-art techniques like Inpainting, and models like Segment Anything and GPT-4 vision to generate feedback for the candidate images.<\/p>\n\n\n\n