My wild guess is that adjusting the shape before each step is not worth the spee... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		pshc on May 19, 2024 \| parent \| context \| favorite \| on: Llama3 implemented from scratch My wild guess is that adjusting the shape before each step is not worth the speed hit. Uniform structures make GPUs go brrrrr

astrange on May 19, 2024 [–]

It's also easier to train and in particular easier to parallelize.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact