Huggingface is currently replicating it. Replications of small models indicate t...

benterix · on Jan 27, 2025

Whoah, that's incredible!

I remember a year ago I was hoping that in a decade from now it would be great to run GPT4-class models on my own hardware. The reality seems to be far more exciting.

moritzwarhier · on Jan 27, 2025

I first sneered at the idea of LLM generated LLM training sets, but is this what might be driving the big efficiency leap?

Asking as someone who honestly only superficially followed the developments since the end of 2023 or so

bufferoverflow · on Jan 27, 2025

You call R1 a small model? It's a 671-billion parameter model.

elorant · on Jan 27, 2025

There are multiple variations of the model starting from 1.5B parameters.

bufferoverflow · on Jan 27, 2025

Those are distillations of the model.

rsanek · on Jan 27, 2025

have you used those? in my experience even the 70B distillation is far worse than what you can expect from o1 / the R1 available on the web

elorant · on Jan 27, 2025

No, I haven't. I've used Perplexity's R1 but I don't know how many parameters it has. It's quite good, although too slow.