>Regardless of what Altman says, its been two years since OpenAI released GPT-4,...

HarHarVeryFunny · on Nov 14, 2024

By itself, sure, but there are many sources all pointing to the same thing.

Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing. Do OpenAI have something secret he was unaware of? I doubt it.

FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").

Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.

famouswaffles · on Nov 14, 2024

>Sutskever, recently ex. OpenAI, one of the first to believe in scaling, now says it is plateauing.

Blind scaling sure (for whatever reason)* but this is the same Sutskever who believes in ASI within a decade off the back of what we have today.

* Not like anyone is telling us any details. After all, Open AI and Microsoft are still trying to create a 100B data center.

In my opinion, there's a difference between scaling not working and scaling becoming increasingly infeasible. GPT-4 is something like x100 the compute of 3 (Same with 2>3).

All the drips we've had of 5 point to ~x10 of 4. Not small but very modest in comparison.

>FWIW, GPT-2 and GPT-3 were about a year apart (2019 "Language models are Unsupervised Multitask Learners" to 2020 "Language Models are Few-Shot Learners").

Ah sorry I meant 3 and 4.

>Dario Amodei recently said that with current gen models pre-training itself only takes a few months (then followed by post-training, etc). These are not year+ training runs.

You don't have to be training models the entire time. GPT-4 was done training in August 2022 according to Open AI and wouldn't be released for another 8 months. Why? Who knows.

HarHarVeryFunny · on Nov 14, 2024

> After all, Open AI and Microsoft are still trying to create a 100B data center.

Yes - it'll be interesting to see if there are any signs of these plans being adjusted. Apparently Microsoft's first step is to build optical links between existing data centers to create a larger distributed cluster, which must be less of a financial commitment.

Meta seem to have an advantage here in that they have massive inference needs to run their own business, so they are perhaps making less of a bet by building out data centers.