arXiv:2303.15619v2 Announce Type: replace-cross Abstract: The choice of \emph{which} tokens to mask is a central, under-examined design decision in masked language modeling (MLM). Standard pretraining masks tokens uniformly at random, but several studies show that more informative masking targets can improve downstream performance. We study...
Læs hele artiklen hos kilden.
Kommentarer (0)
Ingen kommentarer ennå. Bli den første til å kommentere!