VERT: Versatile Entity Recognition & Disambiguation Toolkit

成立时间:July 1, 2012

While knowledge about entities is a key building block in the mentioned systems, creating effective/efficient models for real-world scenarios remains a challenge (tech/data/real workloads).

Based on such needs, we’ve created VERT – a Versatile Entity Recognition & Disambiguation Toolkit. VERT is a pragmatic toolkit that combines rules and ML, offering both powerful pretrained models for core entity types (recognition and linking) and the easy creation of custom models. Custom models use our deep learning-based NER/EL models that minimizes the needs for handcrafted features to quickly create deployeable models with state-of-the-art quality and performance, as well as facilitate refining pre-trained models.

VERT emphasizes requirements from real world use cases and it’s composed of 4 main modules:

  • Prebuilt models to recognize common base types (date/number/units/phone-number). These support extraction, normalization, and resolution of such entities; in a form easy to use by developers (tagged entity, semantic expression, resolution).
    This module is publicly available as a deterministic rule-based system designed for extensibility (Recognizers-Text (opens in new tab)) supporting multiple languages. Moreover, vNext is on the way.
  • LoReO, a logical-operations layer to help handle entity lists/ranges/alternatives, where entity tagging is not enough and basic relationship parsing is necessary for consumers. Comparative and Superlative Adjectives, or Adjective Phrases like “not so expensive”.
  • NERPlus, a pipeline for NER. VERT offers deep learning models with two model structures depending on perf requirements.
    This module includes pre-trained models for typical entity types and available in 5 languages, as well as tools to train or refine new models.
  • Entity linking to knowledge bases (called ELIS), which both helps bring additional “types”, as well as additional information that can be used by systems in downstream tasks. VERT includes pre-trained entity linking against Wikipedia in English and Spanish and easy data refresh as knowledge bases are always being updated.

VERT research also target cross-lingual scenarios and its modules are under continuous research and release across different platforms.
You can find related papers and their source code in the VERT repo on GitHub: https://github.com/microsoft/vert-papers/ (opens in new tab).

人员

Chin-Yew Lin的肖像

Chin-Yew Lin

Senior Principal Research Manager

Guoxin Wang的肖像

Guoxin Wang

Research SDE2

Zhiwei Yu的肖像

Zhiwei Yu

ASSOCIATE RESEACHER