Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Huggingface is currently replicating it.

Replications of small models indicate that they don't lie any significant amount. The architecture is cheap to train.

Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Model RL Revolution https://xyzlabs.substack.com/p/berkeley-researchers-replicat...



Whoah, that's incredible!

I remember a year ago I was hoping that in a decade from now it would be great to run GPT4-class models on my own hardware. The reality seems to be far more exciting.


I first sneered at the idea of LLM generated LLM training sets, but is this what might be driving the big efficiency leap?

Asking as someone who honestly only superficially followed the developments since the end of 2023 or so


You call R1 a small model? It's a 671-billion parameter model.


There are multiple variations of the model starting from 1.5B parameters.


Those are distillations of the model.


have you used those? in my experience even the 70B distillation is far worse than what you can expect from o1 / the R1 available on the web


No, I haven't. I've used Perplexity's R1 but I don't know how many parameters it has. It's quite good, although too slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: