Great work releasing such a small model! I would like to know your thoughts on u...

		WithinReason 6 months ago \| parent \| context \| favorite \| on: Gemma 3 270M: Compact model for hyper-efficient AI Great work releasing such a small model! I would like to know your thoughts on using 2/3 of the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead?