{"id":674754,"date":"2020-07-13T12:09:05","date_gmt":"2020-07-13T19:09:05","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=674754"},"modified":"2024-06-05T10:22:59","modified_gmt":"2024-06-05T17:22:59","slug":"project-karya","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-karya\/","title":{"rendered":"Digital Labor: Project Karya"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"Project\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Digital Labor: Project Karya<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n
\"Project<\/figure>\n\n\n\n

Karya aims to enable supplemental income opportunities for people in low-income and marginalized communities by connecting them to AI enabled digital work. It capitalizes on two major trends. First, AI is penetrating all walks of life with every increasing focus on enabling natural language interactions with AI systems. Second and more importantly, technology is getting better and cheaper with smartphone prices and data connectivity costs decreasing every day. Particularly in India, a majority of even low-income households have access to a smartphone.<\/p>\n\n\n\n

We have built a smartphone-based digital work platform that makes a wide variety of language-based digital tasks accessible to people in low-income communities particularly in rural India. The open-sourced platform is being used for several major data collection efforts in India.<\/p>\n\n\n\n

In addition to being a source of income, participating in such a platform can boost the digital skills of the workers. Our recent research also shows that digital work can be a great mechanism to deliver knowledge and skills to users.<\/p>\n\n\n\n

The project was recently spun off as a impact-focused startup. More details can be found at https:\/\/karya.in (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n\n\n

Rajasthani Hindi Speech Data<\/h3>\n\n\n\n

April 2021<\/p>\n\n\n\n

This dataset consists of audio recordings of participants reading out stories in Rajasthani Hindi, one sentence at a time. We had 98 participants from Soda, Rajasthan. Each participant read 30 stories. In total, we have 426873 recordings in this dataset. We had roughly 58 male participants and 40 female participants.<\/p>\n\n\n\n

\n
Download Dataset<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

Odia Speech Data and Model<\/h3>\n\n\n\n

October 2021<\/p>\n\n\n\n

As part of this release, Navana Tech and Microsoft Research India are open-sourcing 1,648 hours of validated Odia speech dataset and a baseline model for Odia speech recognition. The speech dataset consists of recordings in Agriculture, Banking, and Healthcare in four dialects of Odia collected from five different districts. Please read the README.md file for more details.<\/p>\n\n\n\n

*Note that the dataset download link cannot be used directly in a browser<\/em><\/strong><\/p>\n\n\n\n

\n
Download link<\/a><\/div>\n<\/div>\n\n\n\n

How do I use a download link for an entire dataset?<\/strong><\/p>\n\n\n\n

A download link for an entire dataset provides the location of the dataset in Azure as well as a special time-limited key that allows you to download the entire dataset. Copy the button link above and use it with tools that can copy files from Azure, like the following:<\/p>\n\n\n\n