My bet is there's no such thing as 'too efficient' in this space. If you can get a very good model on a small device, it's going to be totally amazing on a huge GPU.
I bet you're wrong - there are already massive diminishing returns in the best models from 2024 vs 2023. This idea that you can just through more compute and it scales with performance is fiction. You do get more performance with more compute, but it doesn't scale, and it's a waste of money, as shown with deepseek.
this conversation reminds me of people when the PS2 came out saying that by 2010 games would look literally better than real life, because they thought graphics quality would exponentially improve...
I would agree except I think it’s 1997. I’ve gotten DeepSeek to mostly solve a not very complex problem in a somewhat obscure domain (home assistant automations) in 214 seconds of its own ‘thinking’ and if you can get this two orders of magnitude lower this unlocks completely new use cases ie. demand.