Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My wild guess is that adjusting the shape before each step is not worth the speed hit. Uniform structures make GPUs go brrrrr


It's also easier to train and in particular easier to parallelize.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: