Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Calling it "training LLM" is a bit misleading. This is a small GPT-2-sized model (~160M params), while the "L" in "LLM" stands for large...

I've always felt the natural way of referring to smaller LLMs would be Medium Language Models and Small Language Models, but I guess MLM is an inauspicious acronym.





It's also already used for language modelling:

MLM is masked language modelling, another phrase for training models on the cloze task. It's the most common way to train encoder-only models.

CLM (causal language modelling) is the other common task where you autoregressively predict the next token given the previous ones. It's the most common way to train decoder-only models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: