Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
pshc
on May 19, 2024
|
parent
|
context
|
favorite
| on:
Llama3 implemented from scratch
My wild guess is that adjusting the shape before each step is not worth the speed hit. Uniform structures make GPUs go brrrrr
astrange
on May 19, 2024
[–]
It's also easier to train and in particular easier to parallelize.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: