Another factor is that DeepSeek is not just doing inference, but also training models, so they can use underutilized compute nodes for training during off-peak hours, as described in their DeepSeek v3 article: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
But I agree that the main driver is that they are really good at optimizing. They will have chosen their architecture in such a way that it will be as efficient as possible on their own infrastructure, so they have a massive head start. Inference framework developers still have to catch up.
But I agree that the main driver is that they are really good at optimizing. They will have chosen their architecture in such a way that it will be as efficient as possible on their own infrastructure, so they have a massive head start. Inference framework developers still have to catch up.