> Calling it "training LLM" is a bit misleading. This is a small GPT-2-sized mod...

jszymborski · 2025-12-09T15:07:07 1765292827

It's also already used for language modelling:

MLM is masked language modelling, another phrase for training models on the cloze task. It's the most common way to train encoder-only models.

CLM (causal language modelling) is the other common task where you autoregressively predict the next token given the previous ones. It's the most common way to train decoder-only models.