I'd guess that the ability of a very small model to do well on the TinyStories d...

kouteiheika · on Jan 2, 2025

> I'd guess that the ability of a very small model to do well on the TinyStories dataset isn't just because of the limited 3-4yr old vocabulary, but also because of it being an LLM-generated dataset.

You guess is correct. The level of vocabulary has little to do with it. There was a paper about this a while back (sorry, can't find the link) where they found that the model still learned just as well when they increased the complexity of the text, as long as the texts were LLM generated.