Downloads
Loading…
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
June 2020
DeBERTa (Decoding-enhanced BERT with disentangled attention) improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the…