This specific model is only trained on 100 billion tokens, so it's not SOTA by a...

		arilotter on Dec 2, 2024 \| parent \| context \| favorite \| on: Pre-training a 15B parameter language model over t... This specific model is only trained on 100 billion tokens, so it's not SOTA by any means, but we've got designs on larger training runs later :)