Part of the issue is they are training a pretty tiny model, it's not like GPT-2 ...

		sebzim4500 on June 7, 2024 \| parent \| context \| favorite \| on: σ-GPTs: A new approach to autoregressive models Part of the issue is they are training a pretty tiny model, it's not like GPT-2 ~100M is especially coherent either.