AI and Microsoft Research header - abstract neural network pattern on dark spectrum background

AI Frontiers

Downloads

Phi-1.5

December 2023

The language model phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common…

Download

KITAB Dataset

February 2024

🕮 KITAB is a challenging dataset and a dynamic data collection approach for testing abilities of Large Language Models (LLMs) in answering information retrieval queries with constraint filters. A filtering query with constraints can be of the form “List all books…

Download

FLAML: A Fast Library for AutoML and Tuning

December 2020

FLAML is a Python library designed to automatically produce accurate machine learning models with low computational cost. It frees users from selecting learners and hyperparameters for each learner. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection…

Github

Phi-3

April 2024

The Phi-3-Mini-128K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to…

Download

VISOR

September 2023

Benchmarking Spatial Relationships in Text-to-Image Generation—Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent large-scale text-to-image synthesis (T2I) models have shown unprecedented…

Github

ToxiGen

May 2022

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language.…

Github