About
I am a Post-Doctoral Researcher at Microsoft Research India, working on multilingual and multicultural AI — building language technologies that serve speakers of low- and mid-resource Indian languages as well as they serve English speakers, without flattening how those communities actually talk, write, and think.
My current work spans three connected threads: synthesizing culturally grounded training data for Indic languages at scale, auditing how the field evaluates multilingual models today, and packaging what we learn into practitioner guidance that ML engineers and product teams can act on.
Synthetic Data for Indic Languages — UPDESH
A central question for multilingual AI is how to bridge the data gap for languages with little high-quality training text. With colleagues at MSR India, I co-built UPDESH, a 9.5M-point synthetic instruction-tuning dataset across 13 Indic languages and English. UPDESH deliberately combines two paradigms that the field usually treats as alternatives: top-down translation for reasoning-style tasks (math, logic, chain-of-thought), and bottom-up culturally grounded generation seeded from native-language Wikipedia and 26,800+ curated Indian cultural artifacts — festivals, cuisine, traditional arts, architecture, and religious practices.
Fine-tuned Llama-3.1-8B and Phi-4-14B models trained on UPDESH win 8 of 12 downstream tasks against strong baselines (Bactrian-X, IndicAlign, Aya-Collection), with the largest gains on generation tasks for low-resource languages like Assamese and Odia. The UPDESH-32K variant tops a cultural ELO leaderboard built from ~92,000 pairwise human battles on India-centric community queries (ELO 1696, 71.1% win rate). The dataset is available on Hugging Face (opens in new tab).
Auditing Multilingual Evaluation
The other half of bridging the gap is being honest about what we measure. In ongoing work, we are auditing 51 benchmarks across 242 datasets, 219 languages, 29 language families, and 15 task types along three axes — coverage (“what is being evaluated?”), representativeness (“is the evaluation valid?”), and rigor (“can we trust the scores?”).
Some headline findings: 36% of languages appear in only a single benchmark; 56% of dataset–language instances are translated from English rather than natively authored; contamination is pervasive (e.g., PAWS-X 67.9%, MGSM 45.5%) yet only 13% of recent model releases report any contamination analysis; and LLM-as-judge agreement with humans drops sharply on culturally and linguistically nuanced criteria (fluency, persona adherence, linguistic plausibility — all below 60%) even while it remains high on safety-style judgements. The throughline: multilingual evaluation today is wide but thin, and risks becoming performative if these gaps aren’t addressed.
Vibhasha — A Practitioner’s Playbook
Research findings only matter if practitioners can act on them. I co-lead Vibhasha (opens in new tab), a living playbook for building multilingual and multicultural AI systems, in collaboration with Sunayana Sitaram, Tanuja Ganu, Kalika Bali, and colleagues from MSR Africa. Vibhasha walks ML engineers, researchers, and product teams through the key decisions in building multilingual AI — data, prompting, translation, fine-tuning, evaluation, safety, synthetic data, and culture — with concrete decision paths (off-the-shelf prompting, translation, fine-tuning, or building your own) chosen for a team’s constraints. Read it at aka.ms/Vibhasha (opens in new tab).
Selected Work
- UPDESH: (opens in new tab) Synthesizing Grounded Instruction Tuning Data for 13 Indic Languages — Chitale, Gumma, Ahuja, Kodali, Uppadhyay, Sudharsan, Sitaram. Dataset on Hugging Face (opens in new tab).
- The State and Fate of Multilingual, Contextual Evaluation in the NLP World — a multi-axis audit of multilingual evaluation across 51 benchmarks and 219 languages — Uppadhyay, Beniwal, Kodali, Sitaram.
- The Vibhasha Playbook (opens in new tab) — a living guide for building multilingual & multicultural AI systems — Kodali, Sitaram, Ganu, Bali, with colleagues from MSR Africa.
For earlier work on code-mixed NLP and a full publication list, see my Google Scholar profile (opens in new tab).
Background
Before MSR, I completed my PhD (opens in new tab) at IIIT Hyderabad (opens in new tab) (MT-NLP Lab @ LTRC (opens in new tab) and PreCog @ C2S2 (opens in new tab)), advised by Manish Shrivastava and Ponnurangam Kumaraguru, working on metrics, datasets, and models for English–Hindi code-mixed text. I am a recipient of the Microsoft Research India PhD Award (2024) for this work. Earlier, I spent six years as an engineer at Chennai Metro Rail Limited on telecom and automatic fare collection systems, after a B.Tech in Electronics & Communication and a Post-Graduate Diploma in Metro Rail Technology & Management from IIT Madras.