He's building the model from scratch, as the title suggests. He only trains a sm...

coldtea · on Jan 18, 2023

>I'm honestly unsure why Microsoft would rather pay $10B to acquire less than half of OpenAI, as they have the hardware to do it (OpenAI uses MS cloud products.)

Because the hardware is the least interesting part of it?

Microsoft buys the know-how, the talent, and perhaps some patents, but most importantly the GPT brand name...

pcthrowaway · on Jan 18, 2023

Does the time to train the model increase linearly with the number of parameters, or exponentially?

In other words, GPT-3 is 17,500X the number of parameters but does that mean you can train it in 17,500X the amount of time it takes to train the 10M param model?

bitL · on Jan 18, 2023

In theory it should be linear, however, the parallelization is not perfect and some overlapping parts of gradients are computed on multiple GPUs at the same time so expect some constant factor slowdown on average.

sebzim4500 · on Jan 18, 2023

On top of what other people have said about parallelism overheads, you normally need more data to train a bigger network and the training time is roughly proportional to network size * training data.

IIRC OpenAI used a million times more data to train GPT3 than karpathy used in this video, so a naive estimate would be that it would take about 20 billion times more compute. This is could be a significant overestimate since Karpathy probably used each bit of the training set more times than openAI did.

tysam_and · on Jan 18, 2023

I am not from the LLM world, but I believe it's mostly constrained by the standard multiprocessing limits -- communication and synchronization of multiple workers, some of whom operate over an exceedingly slow Ethernet interface.

roflyear · on Jan 18, 2023

They are buying the talent like they would when they buy any company. They are certainly not buying a single trained model.

joedevon · on Jan 18, 2023

How much of the purchase price is a barter for Azure compute-time? That may explain a lot.

poulpy123 · on Jan 19, 2023

By single GPU, is a normal one would suffice?