Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where are you getting a CPU + RAM + RTX 3090 for $1k? To even install a million of these machines, you'd have to build a new datacenter, the capital costs are going to be beyond just the wholesale price of GPU boards, and you'll have to hire a ton of datacenter technicians.

But leaving that aside, look at OpenAI's pricing. $.02/1K tokens. Let's say the average query would be 20 tokens, so you'd get 50 queries/$.02 = 2500 queries/1$, or for 100k, $40/sec * 86400 * 365 = $1.2b. My guess is OpenAI's costs right now are not scaled to handle 100k QPS, so they're way underpriced for that load. This might be a cost Google could stomach.

I just think blindly shoe-horning these 100B+ param models into this use case is probably the wrong strategy, DeepMind's Chinchilla has shown it's possible to significantly reduce parameter size/cost while staying competitive in accuracy. I think Google's going to eventually get there, but they're going to do it more efficiently that brute forcing a GPT-3 style model. These very large parameter models are tech demos IMHO at this point.



You can get an RTX 3090 for < $1k. I was largely handwaving away the rest of the costs since all the processing is done on those cards and basic hardware is really cheap now a days. But in hindsight that might not be entirely reasonable because you would need a motherboard that could support a 4x setup, as well as a reasonable power supply. But the cost there is still going to be in the same ballpark, so I don't think it changes much.

That said I do agree with your final conclusion. Bigger is not necessarily better in neural networks, and I also expect to see requirements rapidly decline. I also don't really see this as being something that's going to gets ultra-monopolized and centralized. One big difference between natural language interfaces and something like search is user expectations. With natural language the user has an expectation of a result, and if a service can't meet that expectation - then they'll go elsewhere. And I think it is literally impossible for any single service to meet the expectations of everybody.


Why would cost per query go up measurably for a highly parallelizeable workload?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: