I agree hype is a big portion of it, but if DeepSeek really has found a way to train models just as good as frontier ones for a hundredth of the hardware investment, that is a substantial material difference for Nvidia's future earnings.
> if DeepSeek really has found a way to train models just as good as frontier ones for a hundredth of the hardware investment
Frontier models are heavily compute constrained - the leading AI model makers have got way more training data already than they could do anything with. Any improvement in training compute-efficiency is great news for them, no matter where it comes from. Especially since the DeepSeek folks have gone into great detail wrt. documenting their approach.
If you include multimodal data then I think it's pretty obvious that training is compute limited.
Also current SOTA models are good enough that you can generate endless training data by letting the model operate stuff like a C compiler, python interpreter, Sage computer algebra, etc.
Is it? Training is only done once, inference requires GPUs to scale, especially for a 685B model. And now, there’s an open source o1 equivalent model that companies can run locally, which means that there’s a much bigger market for underutilized on-prem GPUs.
I'd be really curious about the split in hardware for training vs inference - I got the read that it was a very high ratio to the point the training is not a significant portion of the requisite hardware but instead the inference at scale sucks up most of the available datacenter gpu share.
Could be entirely wrong here - would love a fact-check by industry insider or journalist.
Making training more effective makes every unit of compute spent on training more valuable. This should increase demand unless we've reached a point where better models are not valuable.
The openness of DeepSeek's approach also means that there will be more smaller entities engaging in training rather than a few massive entities that have more ability to set the price they pay.
Plus reasoning models substantially increase inference costs, since for each token of output you may have hundreds of tokens of reasoning.
Arguments on the point can go both ways, but I think on the balance I would expect any improvements in efficiency increase demand.
Unless we get actual AGI I don't honestly care as a non coder. The art is slop and predatory, the chatbots are stilted and pointless, anytime a company uses AI there is huge backlash and there are just no commercial products with any real demand. Make it as cheap as dirt and I still don't see what use it is besides for scammers I guess...
1. Nobody has replicated their DeepSeek's results on their reported budget yet. Scale.ai's Alexander Wang says they're lying and that they have a huge, clandestine H100 cluster. HuggingFace is assembling an effort to publicly duplicate the paper's claims.
2. Even if DeepSeek's budget claims are true, they trained their model on the outputs of an expensive foundation model built from a massive capital outlay. To truly replicate these results from scratch, it might require an expensive model upstream.
Given they've reproduced earlier model's and vetted it - I think it's probably safe to assume that these new models are not out of thin air - but until somebody reproduces it, it's up in the air.
Not really. The training methodology opens up whole new mechanisms that'll make it much easier to train non-language models, which have been very much neglected. Think robot multi-modal models; visual / video question answering; audio processing, etc.