This points to an interesting future for foundation models. This is an 18x cost ...

rerx · on June 28, 2023

V100 GPUs are from 2017, so it's more than two years. A100 already appeared there years ago, btw.

An eight GPU DGX-1 server cost ~149k$ back then (googled news postings). A current gen DGX H100 is 520k$ with 5 years of support. Of course it holds 5x the memory, plus GPUs and interconnect are much faster. But when comparing costs, take price hikes into account.

jsjohnst · on June 28, 2023

An important thing to also keep in mind is how much inflation changed prices over the duration. $520k in 2023 dollars is around $420k in 2017 dollars. Sure, still almost 3x more expensive, but that’s better than being 0.7x higher.

raverbashing · on June 28, 2023

Variations of specializations I guess

For writing code you don't care about feeding world history to your model. So a smaller model might be better at a specialized task

Sure, having a big multi-modal-model is great, but by having specialized models you can spread tasks better

mlboss · on June 28, 2023

But I am sure prompt understanding improves with more text data. Same with reasoning ability.