News & features

Research Focus: Week of July 17, 2023
RetroRanker mitigates frequency bias in predictions of retrosynthesis models; new algorithm beats PPO on language tasks; DER dataset aids grid planning; improved PPML balances privacy & accuracy across shared data; ASL Citizen boosts sign language modeling.

Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation
| Zineng Tang, Ziyi Yang, Chenguang Zhu, Michael Zeng, and Mohit Bansal
Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a…

Research Focus: Week of June 5, 2023
In this issue: Peter Lee discusses AI in medicine. Plus, new research on data inference privacy in machine learning; PII leakage in language models; and automatic prompt organization with gradient descent and beam search.
![An illustration of the KEAR architecture represented by five panels side by side. The first contains an input question—“What is a treat that your dog will enjoy?”—and the answer choices “salad,” “petted,” “affection,” “bone,” and “lots of attention.” The second panel has three boxes, each representing retrieval from a specific knowledge source. A box labeled “Knowledge Graph” has a silhouette of a dog and underneath it and labeled “desires” a silhouette of a dog being petted; a heart representing “affection”; a bone; and clapping hands representing “lots of attention.” A box labeled “relevant questions” has the question “What do dogs like to eat?” and the accompanying answer “Bones.” A boxed labeled “dictionary” contains the definition of “bone”: “a composite material making up the skeleton of most vertebrates.” The third panel, labeled “concatenation with input,” contains the input question followed by “Dog, desires, bone. Dog, desires, lots of attention” followed by the relevant question and finally the dictionary definition of bone. In between each is a separation token [SEP]. The fourth panel is labeled “language model” and contains a quote box labeled “language services,” a cube labeled “model,” and left and right braces punctuation within a circle labeled “language understanding.” The fifth panel is labeled “output” and includes silhouettes of each of the five answer choices. The silhouette of the bone is highlighted in blue, representing the appropriate response.](https://www.microsoft.com/en-us/research/wp-content/uploads/2021/12/1400x788_Common_Sense_still_no_logo-1-480x280.jpg)
Azure AI milestone: Microsoft KEAR surpasses human performance on CommonsenseQA benchmark
| Yichong Xu, Chenguang Zhu, Shuohang Wang, Michael Zeng, and Xuedong Huang
KEAR (Knowledgeable External Attention for commonsense Reasoning)—along with recent milestones in computer vision and neural text-to-speech—is part of a larger Azure AI mission to provide relevant, meaningful AI solutions and services that work better for people because they better capture how people learn and…