Where are you getting a CPU + RAM + RTX 3090 for $1k? To even install a million of these machines, you'd have to build a new datacenter, the capital costs are going to be beyond just the wholesale price of GPU boards, and you'll have to hire a ton of datacenter technicians.
But leaving that aside, look at OpenAI's pricing. $.02/1K tokens. Let's say the average query would be 20 tokens, so you'd get 50 queries/$.02 = 2500 queries/1$, or for 100k, $40/sec * 86400 * 365 = $1.2b. My guess is OpenAI's costs right now are not scaled to handle 100k QPS, so they're way underpriced for that load. This might be a cost Google could stomach.
I just think blindly shoe-horning these 100B+ param models into this use case is probably the wrong strategy, DeepMind's Chinchilla has shown it's possible to significantly reduce parameter size/cost while staying competitive in accuracy. I think Google's going to eventually get there, but they're going to do it more efficiently that brute forcing a GPT-3 style model. These very large parameter models are tech demos IMHO at this point.
You can get an RTX 3090 for < $1k. I was largely handwaving away the rest of the costs since all the processing is done on those cards and basic hardware is really cheap now a days. But in hindsight that might not be entirely reasonable because you would need a motherboard that could support a 4x setup, as well as a reasonable power supply. But the cost there is still going to be in the same ballpark, so I don't think it changes much.
That said I do agree with your final conclusion. Bigger is not necessarily better in neural networks, and I also expect to see requirements rapidly decline. I also don't really see this as being something that's going to gets ultra-monopolized and centralized. One big difference between natural language interfaces and something like search is user expectations. With natural language the user has an expectation of a result, and if a service can't meet that expectation - then they'll go elsewhere. And I think it is literally impossible for any single service to meet the expectations of everybody.
But leaving that aside, look at OpenAI's pricing. $.02/1K tokens. Let's say the average query would be 20 tokens, so you'd get 50 queries/$.02 = 2500 queries/1$, or for 100k, $40/sec * 86400 * 365 = $1.2b. My guess is OpenAI's costs right now are not scaled to handle 100k QPS, so they're way underpriced for that load. This might be a cost Google could stomach.
I just think blindly shoe-horning these 100B+ param models into this use case is probably the wrong strategy, DeepMind's Chinchilla has shown it's possible to significantly reduce parameter size/cost while staying competitive in accuracy. I think Google's going to eventually get there, but they're going to do it more efficiently that brute forcing a GPT-3 style model. These very large parameter models are tech demos IMHO at this point.