Return to Microsoft Research Lab – Redmond

Language and Information Technologies

Downloads

Orca-2-7B

January 2024

Orca 2 is a finetuned version of LLAMA-2. It is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model…

Download

KID: Knowledge Infused Decoding

March 2022

Knowledge Infused Decoding (KID) is a decoding algorithm that infuses knowledge (from Wikipedia) into each step decoding of text generation.

Github

Meta Label Correction for Noisy Label Learning [Code]

February 2021

This repository contains the source code for the AAAI paper “Meta Label Correction for Noisy Label Learning”.

Github

Self-training with Weak Supervision [Code]

April 2021

State-of-the-art deep neural networks require large-scale labeled training data that is often either expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to…

Github

Meta Self-training for Few-shot Neural Sequence Labeling [Code]

October 2021

This is the implementation of the paper Meta Self-training for Few-shot Neural Sequence Labeling. MetaST is short for meta-learning for self-training.

Github

Orca-2-13B

January 2024

Download

Natural Language Interfaces to Web APIs Dataset

April 2019

The NL2API dataset includes the web APIs call from the Microsoft Graph API suite, which are respectively used to search a user’s emails and calendar events. Each data points include the API call, its canonical form and its associated natural…

Download

Meta Representation Transformation for Low-resource Cross-Lingual Learning [Code]

May 2021

This is a source code release for a published research at NAACL 2021. Paper Title: MetaXL: Meta Representation Transformation for Low-resource Cross-Lingual Learning Paper Abstract: The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most…

Github

WALNUT

June 2022

This repository contains the baseline code for the paper published in NAACL 2022: “WALNUT: A Benchmark on Weakly Supervised Learning for Natural Language Understanding”. Detailed description about the data sets and methods can be manuscript at here.

Github