Rajasthani Hindi Speech Data
April 2021
This dataset consists of audio recordings of participants reading out stories in Rajasthani Hindi, one sentence at a time. We had 98 participants from Soda, Rajasthan. Each participant read 30 stories. In total, we have 426873 recordings in this dataset. We had roughly 58 male participants and 40 female participants.
Odia Speech Data and Model
October 2021
As part of this release, Navana Tech and Microsoft Research India are open-sourcing 1,648 hours of validated Odia speech dataset and a baseline model for Odia speech recognition. The speech dataset consists of recordings in Agriculture, Banking, and Healthcare in four dialects of Odia collected from five different districts. Please read the README.md file for more details.
*Note that the dataset download link cannot be used directly in a browser
How do I use a download link for an entire dataset?
A download link for an entire dataset provides the location of the dataset in Azure as well as a special time-limited key that allows you to download the entire dataset. Copy the button link above and use it with tools that can copy files from Azure, like the following:
- AzCopy (opens in new tab) – a command-line tool for Windows or Linux that copies files to and from Azure.
- Azure Storage Explorer (opens in new tab) – a utility that is used to manage Azure storage.