Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great work releasing such a small model! I would like to know your thoughts on using 2/3 of the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: