Return to Microsoft Research Lab – Redmond

LEAP – Language, Learning, Audio, Privacy

下载

Orca-2-7B

2024年1月

Orca 2 is a finetuned version of LLAMA-2. It is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model…

Download

KID: Knowledge Infused Decoding

2022年3月

Knowledge Infused Decoding (KID) is a decoding algorithm that infuses knowledge (from Wikipedia) into each step decoding of text generation.

Github

Self-training with Weak Supervision [Code]

2021年4月

State-of-the-art deep neural networks require large-scale labeled training data that is often either expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to…

Github

Meta Self-training for Few-shot Neural Sequence Labeling [Code]

2021年10月

This is the implementation of the paper Meta Self-training for Few-shot Neural Sequence Labeling. MetaST is short for meta-learning for self-training.

Github

Orca-2-13B

2024年1月

Download

Baselines for Multilingual Reply Suggestion (MRS)

2021年8月

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning…

Github

Meta Representation Transformation for Low-resource Cross-Lingual Learning [Code]

2021年5月

This is a source code release for a published research at NAACL 2021. Paper Title: MetaXL: Meta Representation Transformation for Low-resource Cross-Lingual Learning Paper Abstract: The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most…

Github

dp-transformers repository

2022年8月

Motivated by our recent work, we are releasing a repository for training transformer models with differential privacy. Our repository is based on integrating the Opacus library to the Hugging Face platform. We aim to serve the privacy-preserving ML community in utilizing the state-of-the-art models while…

Github

LiST (Lite Self-Training)

2021年10月

We present a new method LiST for efficient fine-tuning of large pre-trained language models (PLMs) in few-shot learning settings. LiST significantly improves over recent methods that adopt prompt fine-tuning using two key techniques. The first one is the use of…

Github