1. Nobody has replicated their DeepSeek's results on their reported budget yet. Scale.ai's Alexander Wang says they're lying and that they have a huge, clandestine H100 cluster. HuggingFace is assembling an effort to publicly duplicate the paper's claims.
2. Even if DeepSeek's budget claims are true, they trained their model on the outputs of an expensive foundation model built from a massive capital outlay. To truly replicate these results from scratch, it might require an expensive model upstream.
Given they've reproduced earlier model's and vetted it - I think it's probably safe to assume that these new models are not out of thin air - but until somebody reproduces it, it's up in the air.
2. Even if DeepSeek's budget claims are true, they trained their model on the outputs of an expensive foundation model built from a massive capital outlay. To truly replicate these results from scratch, it might require an expensive model upstream.