{
    "byline": null,
    "dir": null,
    "excerpt": "BERT optimizes the Masked Language Model (MLM) objective by masking word pieces uniformly at random in its training data and attempting to predict the masked values. With SpanBERT, spans of tokens are masked and the model is expected to predict the text in the spans from the representations of the words on the boundary. Span lengths follow a geometric distribution, and span start points are uniformly random.",
    "length": 913,
    "siteName": null,
    "title": "SpanBERT: improving pre-training by representing and predicting spans"
}