Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel there’s a gap missing in this thread (or I may be the one missing it)

DeepSeek proved knowledge distillation works very well and cheaply https://en.m.wikipedia.org/wiki/Knowledge_distillation

But they didn’t show how to build a new frontier model cheaply.

So, you still need massive investments to build new frontier models. But the bad part, is they can be replicated cheaply



I think you are missing it:

https://stratechery.com/2025/deepseek-faq/

That has a great overview - this is a new model, but also a distillation. They used new techniques to make it really cheap (comparatively).


Thank you for the link. I have a lot of respect for Stratechery. I learned a lot, and agree, I’m the one who was missing it haha


This comment seems to be complete nonsense. See here https://arxiv.org/abs/2412.19437v1


The internet would be a little better, if people were a little nicer


Sorry


Thanks for saying that :) I appreciate you




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: