Replications of small models indicate that they don't lie any significant amount. The architecture is cheap to train.
Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Model RL Revolution https://xyzlabs.substack.com/p/berkeley-researchers-replicat...
I remember a year ago I was hoping that in a decade from now it would be great to run GPT4-class models on my own hardware. The reality seems to be far more exciting.
Asking as someone who honestly only superficially followed the developments since the end of 2023 or so
Replications of small models indicate that they don't lie any significant amount. The architecture is cheap to train.
Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Model RL Revolution https://xyzlabs.substack.com/p/berkeley-researchers-replicat...