Both your points are not really valid. There have been decades of theoretical br...

Majromax · on July 6, 2023

I'm not sure that the Bitter Lesson is the end of the story. The Bitter Corollary seems to be that scaling computation also requires scaling data.

Sometimes that's easy; self-play in Go, for example, can generate essentially infinite data.

On the other hand, sometimes data isn't infinite. It can seem infinite, such as the aforementioned NLP work, where computation-heavy ML system can process more data than a human can read in their lifetime. However, our LLMs are already within an order of magnitude of reading every bit of human writing ever, and we're scaling our way to that data limit.

"Clever" human algorithms are all a way of doing more with less. People are still more data-efficient learners than large ML systems, and I'm less sure that we'll be able to compute our way to that kind of efficiency.

sashank_1509 · on July 6, 2023

I think Geoffrey Hinton addresses this point well in his recent podcast with Pieter Abbeel. He says and I paraphrase, current Deep Learning methods are great at learning from large amounts of data with a relatively small amount of compute. Human brain on the other hand, with around 150 trillion synapses/ parameters has the opposing problem, parameters/ compute is cheap but data is expensive. It needs to learn a large amount from very less data and it is likely a large amount of regularization (things like dropout) will be required to do this without over-fitting. I think we will have a real shot at AGI once 100Trillion param models become feasible which might happen within this decade.