Meta AI Unleashes Megabyte, a Scalable Model Architecture

mkaic · on May 24, 2023

Ugh. I just spent the past few days reading this exact paper and preparing a detailed presentation on it for my job as an AI researcher, and headlines like this make me roll my eyes very hard.

MEGABYTE is indeed a cool new architecture, but it is still very much just a proof of concept at the moment. The paper shows that the model can compete with (but not decimate) vanilla Transformers on the scale of ~1B parameters in long-sequence prediction tasks. They did not "unleash" anything, and scalability to very large parameter counts and datasets still has not been tested.

I'm certainly very excited to see where this architecture goes as the community gets ahold of it and starts developing it, but to call it "revolutionary" this early on is disingenuous. I personally have a few experiments I want to run with it, but I put the probability of it being a true GPT-killer at <20%. I would love to be wrong, though!

xp84 · on May 24, 2023

Every 'news' site: "But how are we gonna get clicks if we don't vastly exaggerate and sensationalize everything?"

Thanks for the actual sober analysis. As not-an-AI-researcher, I could have been easily bamboozled by an article like this.

nwoli · on May 24, 2023

I wish you had a link to your blog or twitter in your bio, you have the kind of nuanced tone I’d like to hear more from

mkaic · on May 24, 2023

Oh, thanks! Though to be honest, my blog does have a good bit of wild speculation on it, as I'm a bit of a hypocrite on that front :)

I've added links to both in my bio.

riwsky · on May 24, 2023

they should consider doing this for a living

mkaic · on May 24, 2023

I'll consider doing it for a living when it can pay my rent, which is unlikely to be any time soon :P

jdthedisciple · on May 24, 2023

When I started reading this and immediately came across the word "groundbreaking" I paused for a sec and thought to myself:

Let me just pretend this word isn't there, it's probably an exaggeration.

It takes a special kind of adaptation to filter out all the 10%-40% bluff usually found in these kinds of news articles.

kjkjadksj · on May 24, 2023

Its always better to just get at the underlying, usually far dryer scientific paper. Even if its not in your field try and read the intro and the conclusion, the authors will describe why this is generally relevant with much less flowery language than the press release. Even a word like “novel” is bold to use too many times.

derefr · on May 24, 2023

Now I kind of want to see a browser extension that deletes all the intensifier adjectives from runs of text, and is able to be enabled/disabled per domain.

mejutoco · on May 24, 2023

Bit random, but a while ago I created this extension to remove "actually", "literally" and other words (it is slighty configurable) that can be annoying. It is sort of a joke, but some days I think it is not a joke.

https://chrome.google.com/webstore/detail/actually-no-remove...

Maybe I should extend it :)

mkaic · on May 24, 2023

So a less unhinged version of [0]?

[0] https://xkcd.com/1288/

PoignardAzur · on May 24, 2023

> The paper shows that the model can compete with (but not decimate) vanilla Transformers on the scale of ~1B parameters in long-sequence prediction tasks. They did not "unleash" anything, and scalability to very large parameter counts and datasets still has not been tested.

One thing to keep in mind in particular is that the vast majority of transformer alternatives/improvements/optimizations that initially showed promise ultimately ended up scaling less well than the baseline transformer architecture. The transformer has been weirdly hard to improve/dethrone.

mkaic · on May 24, 2023

Yeah, it's had remarkable staying power. I think it says a lot that I can see "Vaswani, et al." cited in a paper and know exactly what the author is referring to lol. I don't usually memorize researcher names but the original Attention Is All You Need paper is just freaking ubiquitous.

lordofgibbons · on May 24, 2023

I'd love to watch a recording of this presentation or write up, if possible!

mkaic · on May 24, 2023

Hmm, I may make a blog post using some of the graphics I made for the presentation. Can't make any guarantees though.

jacobn · on May 24, 2023

My main argument against the AI doomsayers has so far been that the current scaling laws simply make runaway singularity style scenarios algorithmically impossible (if for each step of improvement you need 10x parameters and 100x training, you quickly run into a brick wall).

This is part of why I’m not worried about the current crop of generative AI. I am however both curious and concerned about what the tsunami of talent and $$$ chasing the current trend will achieve.

If this n^(4/3) alt transformer compute scaling is real (and there’s been many a pretender, so it’s too early to tell), then that could fundamentally change the overall AI scaling law, substantially lowering the brick wall.

And that could be a game changer.

p-e-w · on May 24, 2023

I don't buy the idea (with either architecture) that "10x"-type scaling is required for another breakthrough.

Think of a human with below average intelligence. Then think of a human genius. Now consider how incredibly similar their brains are, despite the massive performance gap. It's not like one has 10x the number of neurons/synapses/connections etc. of the other. They're both healthy human brains, and you need powerful technology to even distinguish them structurally.

Considering this, it seems perfectly possible that a model like GPT-4 is just a hair's breadth away from vastly superhuman performance. It certainly beats the average human at many tasks already. The gap between a moron and a genius is a lot larger than that between GPT-4 and a superhuman.

wudangmonk · on May 24, 2023

You mean the average brain with about 100 billion neurons with about 1000 connections each bringing it to around 100 trillion connections. With an estimated 1000 "AI" neurons required per biologial neuron.

I don't think you are givin these "below average" intelligence individuals enough credit. What we consider a genius is the equivalent of a dog show obstacle course. We measure intelligence/genius as whatever is hard for humans and completely ignore what is easy because we fail to see the complexity behind the easy stuff.

mirekrusin · on May 24, 2023

Nobody said that nature is optimal. Wheels are trivial, however not present in biology. Nature creates tentacles, not jet engines, nuclear energy etc.

Majority of human brain computation is spent on things that are simply not necessary for computer models (how to wiggle limbs, mouth, eyes etc).

Current LLM are impressive, but we know they can be much more efficient - we're using very low quality training data, we don't use any methods to question training input/evaluate it against current knowledge etc. Our current language models are based on reciting/memorization/force-feeding, not true learning.

Computer models have massive underlying advantage of working on CPU/GPUs where they can be modeled, cloned, retrained, have binary accuracy, can integrate with specialized code instantly, have access to massive memory storage, they are insanely fast and precise etc.

Looking at it from first principles, there is no reason to think that near optimal runtime should not be more efficient than human brain on currently available computers.

We don't need to simulate full brain just like we didn't have to create super fancy legs to go to the Moon.

DrScientist · on May 24, 2023

> Wheels are trivial, however not present in biology.

Because roads don't exist in nature..... imagine trying to out run a predator if you just had wheels but no roads.

> Nature creates tentacles, not jet engines, nuclear energy etc

Under water I think you'll find there is jet propulsion.

Plants and animals are powered by nuclear energy - the remote fusion reaction is in the sky. Why have an internal nuclear reactor - fissile material is quite rare and fusion is quite hard to contain.

Human's are overly fond of their own children ( machines ) but fail to see the complexity of life. People are in awe of the latest Boston Dynamics robot - wow it can run - wow if can leap - and but don't post pictures of horses or gymnasts.

mirekrusin · on May 24, 2023

You're right and missing main point at the same time.

The reasons why biology didn't evolve what we discovered don't matter.

What matters is that it didn't, yet they can exist and outperform.

Similarly brain is not an upper ceiling for intelligence.

We can create more intelligent machines than us.

Things like energy efficiency don't matter as much - when we run large models, nobody cares that it may require more energy than two sandwiches and a beer per day. We do care about energy efficiency but not at scales of biology.

DemocracyFTW2 · on May 24, 2023

> People [...] don't post pictures of horses or gymnasts.

But that's just because the internet is already filled with cats

ben_w · on May 24, 2023

> Under water I think you'll find there is jet propulsion.

Technically, yes, but most people are going to be thinking of the spinning turbines of doom with the absurdly hot fire in the middle, and that isn't.

NeuroCoder · on May 24, 2023

The lack of efficiency in the brain is a trade off for adaptability. There's no doubt that we can transmit signals faster than neurons can fire but the cost is drastically reduced adaptation. There are micro, meso, and macroscopic networks in the brain that have different degrees of "adaptability". This isn't even considering the variety in neurons or the additional signalling cascades by non neuronal tissue. How does all of this contribute to intelligence? We don't know exactly but much of this has survived millions of years of evolution in many animals so it probably has some role.

That's not to say we can't do this with computers and less computational power. However, it's really improbable that a couple layers of adaptability on an artificial neuron network will be anywhere near sufficient to simulate intelligence in even a rodent.

pixl97 · on May 24, 2023

There are a few counters I have to this and one would be that AI could still end up 'smarter' than us, but have no innate desire to survive. The paperclip maximizer scenarios are an example of this. AI could very well create a highly destructive scenario not only for humans, but also itself because it is "intelligent" but not "aligned" with the idea of survival and evolution.

NeuroCoder · on May 24, 2023

I erased a lengthy response I was writing because I think our sentiments may actually align. In short, I'm about as certain that these computers aren't going to achieve intelligence in my lifetime as I'm certain that the world is round. But none of that means these models aren't dangerous. Misinformation and accelerating backdoor access to network infrastructures are a couple that we are already see happening.

baq · on May 24, 2023

> Nobody said that nature is optimal. Wheels are trivial, however not present in biology. Nature creates tentacles, not jet engines, nuclear energy etc.

Wheels are trivial but useless without bearings.

Bearings most certainly aren't trivial. All the things you listed further are dependent on bearings somewhere.

TeMPOraL · on May 24, 2023

AFAIK both exist in biology. Bacterial flagellum is effectively a motor, and has a working wheel-like structure. IIRC, some crickets had an equivalent of a bearing somewhere in their anatomy too.

Evolution is a greedy, lazy optimizer, so it promotes things that work a-ok for a given environment.

It's also worth noting that wheels alone are not too useful for transportation, as they're only half of the picture. The other half is roads. That is, because we couldn't (and mostly still can't) figure out all-terrain mobility systems that could navigate diverse environments, we cheated and locally flattened the environment, to reduce the problem to that solvable by a humble wheel. Evolution can't cheat like this.

baq · on May 24, 2023

I actually googled before writing the comment (preposterous, I know; also, out of vouge, should've asked chatgpt) and the bacteria thing was the only thing I found, nothing macro scale, which made sense since it's easier to rebuild than to heal. Anything bigger than that would have to be healed and it's hard enough to fix mechanical bearings, can't imagine how to regrow or heal one even if it somehow grows. Maybe it's an issue with my imagination ;)

TeMPOraL · on May 24, 2023

Good point about rebuilding vs. healing. We're kind of rediscovering it with economies of scale and price of labor making it much cheaper to replace things than to repair them.

I can imagine evolution creating macro-scale wheels that can be healed, and/or able to survive long enough before failing to be advantageous - after all, bones and teeth can hardly be healed if badly damaged, and yet they last long enough to stick around as core design elements.

This is why I mentioned roads. Whether or not evolution could iterate its way to macroscale wheels, it wouldn't, because they'd be useless without roads. Legs may be more complex overall, but they're an all-terrain solution that can be incrementally improved, and every improvement step grants improved survivability.

bcrosby95 · on May 24, 2023

They've found gears in insects too

https://www.livescience.com/39577-insects-with-leg-gears-dis...

mirekrusin · on May 24, 2023

If that's the case why are we sending wheeled rovers to mars/moon and not something with legs?

In any case this discussion is going sideways, the point is that nature doesn't have monopoly on being optimal. This also applies to intelligence/learning/modeling something better than brain.

TeMPOraL · on May 24, 2023

> If that's the case why are we sending wheeled rovers to mars/moon and not something with legs?

Because Mars is a simple and boring environment. Most of its surface, and especially the parts we target with rover missions, are effectively flat sheets peppered with rocks - a decent set of wheels and suspension is close to optimal for navigating such terrain.

Now, if we were to send missions to a planet that's mostly forests and rivers, like Earth used to be, then wheels wouldn't cut it - not before cutting down some of the forests first.

> the point is that nature doesn't have monopoly on being optimal. This also applies to intelligence/learning/modeling something better than brain.

Fair enough. Nature doesn't do globally optimal - but it makes things heavily optimized for their environment. That's why our planes are nowhere near as energy-efficient in flying as birds are, but birds cannot travel as far and as fast as our planes can.

mirekrusin · on May 24, 2023

Exactly, same with intelligence - it's polluted with emotions and all kind of "nonsense" - but it doesn't have to. We can create emotionless, super-intelligent machines exceeding human capability by far (and use them as hammers). No need to imitate every detail of the brain to extract intelligence.

pixl97 · on May 24, 2023

The counter to this is you can end up with an exceptionally powerful, but unaligned AI, which presents a new series of 'known unknowns' and 'unknown unknowns' that we have to deal with.

mirekrusin · on May 24, 2023

Yes, possibly. Simple example would be an army robot that is extermely efficient human killer that upps-escaped.

Original argument was around optimising on intelligence and that biology doesn't hold best-possible trophy on it.

We don't need to match number of neural connections in human brain to exceed its intelligence.

TeMPOraL · on May 25, 2023

> We don't need to match number of neural connections in human brain to exceed its intelligence.

That's also true because of wheels/road thing, in that we can "cheat" here too. More specifically, some of the neural connections in the human brain are dedicated to sensing, processing and controlling the dynamic state of human body. Purely-software AIs don't need those for intelligence.

hacoo · on May 24, 2023

Legged-robot technology is still very immature, even more so when the rover was designed. Wheels work well on relatively flat Martian terrain and are a lot less likely to break than robot legs.

Interestingly the latest Mars rover also includes a small helicopter, another technology which requires spinning something on a bearing and does not commonly exist in nature.

mirekrusin · on May 24, 2023

I think 5yo kid with lego would have problems agreeing with this otherwise moot statement.

baq · on May 24, 2023

Don't want to sound mean but I don't think you've dealt with bearings if you say so.

pmoriarty · on May 24, 2023

"Wheels are trivial but useless without bearings."

Carts, chariots, and wheelbarrows (to name but a few examples) have been useful for thousands of years without bearings.

baq · on May 24, 2023

The simplest bearing is a greased axle. This was the big 'wheel' discovery, not the round thingie, but how to attach a box on top. So we agree!

DrScientist · on May 24, 2023

> but useless without bearings.

Or more importantly - level ground.

data-ottawa · on May 24, 2023

Not relevant and not simply picking apart your example but I’ve been nerd sniped:

Wheels are not trivial in a biological sense. Topologically most life is either a tube or a cup depending on digestive systems. Wheels are separated from the body, which would be difficult for base cell division to produce.

Some animals like the pangolin are round shaped and do roll, but it just seems non optimal.

I’d say it’s curious nature hasn’t produced more creates that like making wheels like the dung beetle does, but nature made us and we do like making wheels.

chaxor · on May 24, 2023

Listing the number of neurons in a brain has very little to do with these systems, so it's pretty meaningless. Also, the number of neurons in Wernicke's area and the PFC is quite a small fraction of the brain, making this even more meaningless.

nemothekid · on May 24, 2023

>Think of a human with below average intelligence. Then think of a human genius.

LLMs are not AGI. A human with below average intelligence is still a league above a chimpanzee. A chimpanzee will never be able to read, not because "it's too dumb", but because a chimp's brain lacks the actual hardware for reading. The LLM is the chimpanzee. The gap between an LLM and a "human with below average intelligence" is far more than 10x.

p-e-w · on May 24, 2023

> The gap between an LLM and a "human with below average intelligence" is far more than 10x.

In which direction?

GPT-4 passes the bar exam with a top 10% score. How do you think a human with below average intelligence (or even with average intelligence) would fare?

Copilot generates programming code that solves problems, and in most cases the code is correct. It outperforms many junior professional developers. Do you think a human with below average intelligence could do that?

nemothekid · on May 24, 2023

>How do you think a human with below average intelligence (or even with average intelligence) would fare?

I don't understand what rote memorization to pass a test has to do with intelligence. For the record Kim Kardashian passed the bar; I imagine anyone given the proper motivation and time to study could do it, it's not a hard test.

If AGI is a computer passing a test, then AGI was achieved a long time ago. I don't think a human with below average intelligence could multiple 2 very large primes but I don't mistake my calculator for intelligence. A below average intelligence human can drive a car with a couple hours of training, an LLM can't do that (with a far lower power budget as well).

I'm not saying AGI is impossible, but it's clear LLMs are not AGIs; it's not a matter of having a 100x more powerful LLM, just like making an Ape better at sign language won't make them better at abstract reasoning. An Ape's brain fundamentally lacks the mental machinery for higher level things that humans do. It's not a question of not being smart enough. LLMs are simply one component of the human mind.

p-e-w · on May 24, 2023

> If AGI is a computer passing a test, then AGI was achieved a long time ago.

AIs couldn't even pass a third-grade reading comprehension exam until about 5 years ago. Computers being able to pass tests designed for humans is a very new thing.

> it's clear LLMs are not AGIs

And the main argument for that is that "it's clear". They're beating lawyers, doctors, and software engineers, but obviously, that's not real intelligence...

WoodenChair · on May 24, 2023

>> If AGI is a computer passing a test, then AGI was achieved a long time ago.

> AIs couldn't even pass a third-grade reading comprehension exam until about 5 years ago. Computers being able to pass tests designed for humans is a very new thing.

You're narrowly defining what a "test" is. And therefore the "until 5 years ago" doesn't make any sense. A test is not just written exams like a "third-grade reading comprehension exam." Is playing chess against the world champion not a test? A trad AI computer program beat him 25 years ago, not 5 years ago. Is diagnosing patients with a particular disease better than the average untrained human not a test? An expert system did that 50 years ago.

Your point is not a refutation of the parent post.

coconuthacker42 · on May 24, 2023

You're assuming that everything needed to write code or make arguments is purely intelligence based and has nothing to do with patterns, structural repetition and things glorified autocomplete could do, and that's not true.

flangola7 · on May 24, 2023

Who said anything about everything? AI could not write code AT ALL until very recently. We could invent a drug that kills 99% of cancers and the next day there would be people bemoaning that it isn't a "true" cure.

coconuthacker42 · on May 24, 2023

Hey you're the one claiming intelligence and all, burden of proving it in squarely on you

dr_dshiv · on May 24, 2023

Yes. Let’s define AGI as ability for a single model to pass most human professional tests (no cheating) and to provide genuine human-level flexible cognitive benefit to specialized professionals in diverse fields. Reasonable?

coconuthacker42 · on May 24, 2023

No, because most tests designed for humans test memory and pattern recognition, which computers already can do better than humans so it's not a useful comparison. I'd rather define it to be superhuman AGI when it not only performs better on tests with humans who can use computers during the test but also can perform everyday tasks which are not 'hard' for us humans. That is because we have the hardware in our brains to do these things, doesn't mean that it's easy in the least for a computer.

dr_dshiv · on May 24, 2023

Have some examples? What would be tests that, if passed, you’d say “oh yeah, that’s AGI.”

For instance, if it could make a peanut butter and jelly sandwich? Most challenging things that are easy for us are in the motor domain. While important, I think “intellectual AGI” is a meaningful milestone and closest to what most people think of when they think AGI.

nemothekid · on May 25, 2023

>Have some examples? What would be tests that, if passed, you’d say “oh yeah, that’s AGI.”

The "G" in AGI is general. A computer program or system that could both, lets say write code and learn drive a car would be something closer to an AGI. Written tests are remarkably brittle in showing how intelligent someone is - like we already know the limitations of tests in the real world! Einstein famously flunked his entrance exam, but then invented general relativity at 26; but other posters would have you believe an LLM is more intelligent than Einstein because it could pass a test. When an LLM defends a dissertation in arguably any field that would be way more impressive than an LLM passing a test that humans already designed and know the answers to.

One problem I have with saying a 100x more powerful LLM could become AGI is that there is nothing that leads me to believe that LLMs, as they exist currently, are capable of synthesizing new knowledge and I'm not sure what breakthroughs you would need to get there. Once you start to think about that, you start to run up on the limitations of the LLM. If I were to invent a completely new programming language, I could probably teach a junior engineer how to use it in a week, but the jury is out if I would need to first generate 50,000 sample program and spend $1,000,000 in gpu compute to get an LLM to output the same thing. It's hard to consider such a system AGI. Further still, sure you have Google spending millions of FSD, but it's hard to consider the system they have as general. Could I take Waymo and have it pilot a forklift? Or a submarine? How much would that cost? A """below average intelligence""" human could learn to use a forklift in an afternoon.

All in all, there's more to intelligence than written tests.

coconuthacker42 · on May 24, 2023

The problem with defining AGI isn't only in defining intelligence, but also defining general. Also, why do we treat AGI as a yes/no question, when it probably makes sense to think partially... i don't have a definition of either

dr_dshiv · on May 24, 2023

And that’s why I argue that AGI is already here. It is a spectrum. And we are well along the way to further acceleration. And if you want to say “no, we are not at AGI yet”, we need to define a clear test of what would be beyond that point.

TimPC · on May 24, 2023

This definition fails badly because it doesn't test anything outside of language. At a bear minimum have the tests involved have pictures and descriptions in them and require the AI to use the same model to synthesize information from both.

dr_dshiv · on May 24, 2023

Lots of human tests involve pictures. Why does the definition fail badly?

pixl97 · on May 24, 2023

"intelligence"

This is a very problematic word. For example if you were a civil engineer and went "throw me any old design for a bridge, I have a river I need to cross", you'd have your license removed.

Intelligence is too massively loaded, and too much of a gradient even across humans to try to some up human or AI abilities. It is a multitude of different capabilities that don't necessarily have to be bundled together for something to be 'smart', 'useful', 'capable', and/or 'dangerous'.

flangola7 · on May 24, 2023

Where have I claimed anything?

coconuthacker42 · on May 24, 2023

I'm sorry if I misunderstood your comment but your tone seemed to imply that

qup · on May 24, 2023

> Computers being able to pass tests designed for humans is a very new thing.

Yes! Captchas so effective!

comex · on May 24, 2023

Yes, it would be more accurate to say “computers being able to pass tests designed for humans by humans”, as opposed to CAPTCHAs which are automatically generated tests.

ithkuil · on May 24, 2023

> it's not a matter of having a 100x more powerful LLM,

I think we all can agree that even the best LLM currently is not AGI. That's not what being disputed here I think.

However a 100x more powerful LLM is not just 100x better at recall. A 100x more powerful LLM is not just 100x better at being stupid hallucinatory parrot. A model that is just 100x bigger is not necessarily 100x more powerful if you define power is the ability to achieve goals.

However pure language models will always lack something else: the ability to ground things in reality.

I recently had a dream where I solved some problems and when I woke up I realized that those solutions were bullshit, but I also realized the whole approach of my dreaming self was very similar to what a LLM would have done.

dr_dshiv · on May 24, 2023

> I think we all can agree that even the best LLM currently is not AGI.

Disagree, for the record. If I’d described the capabilities of contemporary AI to 100 AI scientists 5 years ago, I bet more than half would agree to call that AGI. Further, more than 90% would assume that these capabilities were decades and decades away.

jiggawatts · on May 24, 2023

“Oh those goalposts? We moved them over there because they were getting uncomfortably close.”

ChatGTP · on May 24, 2023

Cool, so we're at AGI, we just need ASI, maybe something a bit smarter, and we can shut the fuck up about it and get back to blwing billions on real living creatures problems? ;)

nemothekid · on May 25, 2023

>If I’d described the capabilities of contemporary AI to 100 AI scientists 5 years ago

This is hard to believe, the all you need is attention paper was 6 years ago, GPT1 is 5 years old and GPT3 is 3 years old. The current crop of LLMs wasn't something that happened overnight.

dr_dshiv · on May 25, 2023

No one thought GPT1 would have these scaling effects. Really.

flangola7 · on May 24, 2023

Define grounding things in reality.

We only have our 5 senses to go off of. Meta has already put out one multimodal model incorporating multiple data types, openai is undoubtedly working on it too.

ithkuil · on May 24, 2023

Grounding in reality can be something as simple as what openai is experimenting with plugins or something much more integrated.

It's not a matter of which senses you have, but about being able to "continuously" use them.

The current LLMs are basically unfiltered raw thoughts that must be continuously refined. A similar thing happens in our brains and only a little bit of that is accessible to our consciousness

TeMPOraL · on May 24, 2023

> The current LLMs are basically unfiltered raw thoughts that must be continuously refined. A similar thing happens in our brains and only a little bit of that is accessible to our consciousness

Exactly. But, AFAIK, it's also the part that does the bulk of actual thinking and decision-making for us. In that sense, LLMs may be closer to AGI than people expect, because they seem to be capturing the actual core of intelligence and reasoning - and the missing bits (like long-term memory and higher-level thought stream filter/censor) may be much easier to bolt on to them.

pixl97 · on May 24, 2023

This is why we typically see better performance out of GPT when plugins are bolted in an chain|tree of though with reflection.

The output of LLMs is kind of like our stream of consciousness, there's a lot of things I think, then discount after internally reflecting on the thought which the often leads to a more correct solution. Having an LLM 'think' like this natively would massively increase the necessary the amount of compute needed, hence the expense, so at least in any public products it's not being done at this time.

TeMPOraL · on May 25, 2023

Yup. I totally expect you'll be able to eke out significant performance boost if you chain up the LLMs, so that e.g. you feed the initial query to a first-stage GPT-4 several times (likely in parallel), feed those to some kind of filter models that pass or reject the output, looping until you have, say, 3 passing outputs, then feed that to a summarizer, etc. Maybe play with generating system prompts so that you have multiple entirely different takes on the same query, or stage it. Or, you know, have GPT-4 look at the query and propose a graph of subsequent invocations for you.

I wish I had time to play with it some more right now. The pace of progress in the field is giving me a serious case of FOMO.

ben_w · on May 24, 2023

We have a lot more than 5.

Balance, proprioception, hunger, …

hawk_ · on May 24, 2023

> For the record Kim Kardashian passed the bar;

While I know the wider point you're trying to make, Kim Kardashian is a very smart business person. She may just not fit your narrow definition of "intelligent".

TeMPOraL · on May 24, 2023

> it's not a matter of having a 100x more powerful LLM, just like making an Ape better at sign language won't make them better at abstract reasoning. An Ape's brain fundamentally lacks the mental machinery for higher level things that humans do. It's not a question of not being smart enough. LLMs are simply one component of the human mind.

I believe you do have a good point overall, but this here is, IMHO, a rather bad example, because I'd argue GPT-4 is already capable of abstract reasoning, and this seems to be exactly the skill that improves with increasing dimensionality of the latent space.

I agree that LLMs are equivalent of a single component of a human mind. Specifically, I think they're closest equivalent to our inner voice / inner thoughts. But this part is arguably exactly the one that does most of the abstract reasoning (and most reasoning in general), so I think in fact LLMs do have the "hardware" for that specific aspect. What's lacking right now is the equivalent of the higher, "conscious" layer, that guides, filters and censors the stream of thought. That, and long-term recall. Short-term memory might atually correspond to what the context window is in LLMs.

That, and fusing in all the other senses (sight, sound, smell, taste, touch, time, etc.).

chaxor · on May 24, 2023

You definitely should not make claims like 'LLMs can't drive cars'. LLMs have already been shown several times to be able to navigate as agents in different world environments. Obviously, I'm not advocating this is a good idea, and certainly light years away from safe - but as an experiment in silico, I imagine it can be done quite easily (and probably already has).

The reason scientific researchers in this area are using the term 'AGI' so much is that it does fit the definition of AGI ... *For some definition of AGI*. And there lies the problem - no one can really come to a good consensus on a good definition of AGI. This is why many scientists in this area are avoiding the question altogether - the question is loaded, and is misinterpreted by the public if statements are made.

So, for example, if I make the statement here that e.g. GPT-4 has intelligence which is general (AGI), it will likely be met with a rabid response from HN. However, the claim may be more dull than you're expecting. People often conflate AGI with things that are not required, such as agency, etc.

This definition from [journal Intelligence Vol 24, No1, 1997] can be that "some definition" of AGI: must be able to 1) think abstractly, 2) comprehend complex ideas, 3) reason, 4) plan, 5) solve problems, 6) learn quickly from experience.

Many of these GPT-4 can do, if some modifiers are allowed - for example, GPT-4 can learn quickly from experience so long as you aren't starting a 'new' GPT-4 system from scratch every time you want to interact with it. This is probably preferable, since much of the experiences it will have are personal to the individual working with it, and it would be highly undesirable to do the opposite here. Planning was difficult for the system early on, but it has appeared to learn that it is helpful to lay out a plan early on in a large task, so that appears to be a capability as well, at least on a basic level qualitatively. There is some good literature on ability to reason (and solve problems in abstract and complex ideas) from Microsoft's group on causal reasoning. In pretty much all tasks of causal reasoning the system can achieve near human performance, and LLMs as a category outperform previous SoTA from more targeted or specific systems made for causal reasoning.

Anyway, I would suggest to anyone that has strong reactions to claims about AGI to realize that they are likely building up the statement to be more than it is. Perhaps similar to 'machine learning' may have been misinterpreted years ago ("A machine can learn?! There is no tomorrow!!"), what is being stated here is often more narrow than you may believe.

Kiro · on May 24, 2023

Are you saying that Kim Kardashian is stupid?

pmoriarty · on May 24, 2023

"GPT-4 passes the bar exam with a top 10% score."

Turns out that may be as much marketing as truth.

According to this paper[1]:

"although GPT-4's UBE score nears the 90th percentile when examining approximate conversions from February administrations of the Illinois Bar Exam, these estimates are heavily skewed towards repeat test-takers who failed the July administration and score significantly lower than the general test-taking population. Second, data from a recent July administration of the same exam suggests GPT-4's overall UBE percentile was ~68th percentile, and ~48th percentile on essays. Third, examining official NCBE data and using several conservative statistical assumptions, GPT-4's performance against first-time test takers is estimated to be ~63rd percentile, including ~41st percentile on essays. Fourth, when examining only those who passed the exam (i.e. licensed or license-pending attorneys), GPT-4's performance is estimated to drop to ~48th percentile overall, and ~15th percentile on essays."

[1] - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4441311

intelVISA · on May 24, 2023

> Copilot generates programming code that solves problems, and in most cases the code is correct. It outperforms many junior professional developers. Do you think a human with below average intelligence could do that?

Absolutely, it's why Javascript is so popular.

p-e-w · on May 24, 2023

A human with below average intelligence can outperform many junior professional software engineers because of JavaScript?

Do you mean a programmer with below average intelligence (for a programmer)? Because I'm having a hard time believing that you actually believe what you wrote.

BTW, JavaScript is actually a fairly difficult language to learn and use. Python, BASIC, and even Fortran are much simpler conceptually and have far fewer pitfalls. JavaScript is popular because it's the only language that every modern computer has an interpreter for, not because it's so easy that idiots can use it.

intelVISA · on May 24, 2023

It was a bit tongue in cheek I'll confess. :)

arcanemachiner · on May 24, 2023

It's gotta be a low-effort joke.

satellite2 · on May 24, 2023

Maybe we are simply culturally inclined to believe that the bar exam is more sophisticated than say, learn how to drive. A lot of the compute power required to drive is simply hidden from us because it's inconsious. While intellectual tasks are mostly done consciously. So it remains to be shown which task is actually more complex. And in the meantime it's clear than most people, independantly of their academic background can learn how to drive relatively easily.

TimPC · on May 24, 2023

Sure but I can find examples of this in both directions. GPT-4 fails spectacularly at any task that doesn't have very precisely sanitized inputs presented to it in a certain format. It also fails at any task requiring any sort of interaction with the world. I think criticisms that it isn't a general intelligence are entirely fair even if it seems a bit more like general intelligence than most things that came before.

gentleman11 · on May 24, 2023

> GPT-4 passes the bar exam with a top 10% score

There was an article on HN yesterday debunking that. OpenAI is prone to exaggerating

dontupvoteme · on May 24, 2023

it's controversial but they're not allowed to sue us for it -- being a lawyer or doctor just requires a lot of rote memorization.

qup · on May 24, 2023

In your metaphor, maybe what the LLM (chimpanzee) "brain" is missing is some things like an inner thought loop, long term and short term storage and context, etc. Pieces we can understand, build, and surround the LLM with. Given the right arrangement, and the correct "abracadabra," perhaps that gets to where we are.

It seems to me that LLMs are perhaps capable of making up the inner thought loop, and the rest of the obvious systems seem doable. The context seems to be the hard problem; pulling in the proper things, and giving them the correct amount of attention.

pixl97 · on May 24, 2023

>pulling in the proper things, and giving them the correct amount of attention.

This is likely one of the harder problems to solve. One of the strengths humans seem to have is the power of analogy. These analogies quite often can lead us into new paths of thought, or at least in the correct direction.

How do you do this in a machine system without wasting tons of power chasing dead ends, not sure?

fnordpiglet · on May 24, 2023

LLM are a single model. Multimodal LLM tightly integrated with classical AI techniques to inform, conform, compel, solve, optimize, etc in a tight feedback loop…. I think folks are too entranced by the abilities of LLM and too jaded to realize the flaws are in domains we’ve already achieved great success. the magic really comes when we combine them with classical AI techniques. I don’t think you need more than GPT4 level LLM, what you need is goal based agency, information retrieval, optimizers, solvers, etc. These are things we’ve already created. The magic comes not from bigger LLM but from integration.

ithkuil · on May 24, 2023

The key idea behind the current wave of AI is that the necessary "structure" will be created through training by sheer amount of data. The idea has been dismissed for decades but it turned out one of the most effective ways to get practical results.

Now, can and should all structure necessarily come from pure training? Is there a better "seed" structure that can make models more effective? Is there some other bits that are missing that are not in the microstructure but in the way the model is connected to the external world and itself (feedback loops etc)?

epups · on May 24, 2023

By what criteria? In several professional capacities, including coding, LLM's are far superior to an average human already.

slowmovintarget · on May 24, 2023

No they're not. They're superior at specific tests when prompted.

But asking for a complete application nearly always fails, when an average human that knows programming should be able to make one that works.

CamperBob2 · on May 24, 2023

But asking for a complete application nearly always fails, when an average human that knows programming should be able to make one that works.

"This talking dog is a dumbass. The risotto recipe he gave me sucked, and the C++ code he wrote is full of security holes. I don't see what all the hype is about."

Hint: ML will get better. Humans will not. ("Slow moving target," indeed.) What we're learning is just how many aspects of our vaunted 'intelligence' are really just features of our languages and the graphs they encode.

riffraff · on May 24, 2023

This might be true but it's irrelevant to the claim "LLMs are already better than humans".

ben_w · on May 24, 2023

Google Translate, which IIRC is also a Transformer model like GPT these days, knows more natural languages than I can remember the names of, to a higher standard than I know my best non-native language, and I moved to Germany 5 years ago.

GPT-3 knows most programming languages better than I do, even though I literally learned to read with the Commodore 64 user manual back in the 80s and haven't stopped being a nerd since; and while GPT code isn't always correct or even compilable, it's not like I don't still make mistakes that cause compilation to fail a few times a day, and there's a reason we all insist on testing code rather than just assuming it will work when a dev stops typing.

Again, I acknowledge GPT-3.5 (and I assume 4 but have not used it) is not perfect: like others I'd call 3.5 a "junior developer" (and it isn't even that good in every subject!); yet, despite that, the only domains where I can regularly beat it at those where my perception of reality is fundamentally different from its perception (how words sound and how numbers are composed, so it's relatively bad at arithmetic and rhyme), or where it has been forced to be bad (ask it to write about conflict, my experience is everyone reconciles and lives happily ever after).

But in most cases it would take me years to get as good as it already is. And there are more of those subjects than I can remember the names of, too.

CamperBob2 · on May 24, 2023

You are fixated on the current state of the art, when the first couple of time derivatives are what actually matter. The assertion in question is already true in a limited sense, and it's only going to go in one direction from here.

qup · on May 24, 2023

The LLM knows it doesn't work, too, we just don't run it long enough to let it try it out. It would try, get an error, and know how to handle the error.

https://github.com/drifting-in-space/botsh

Check out some agents that probably can already do more than you knew about.

p-e-w · on May 24, 2023

> when an average human that knows programming should be able to make one that works.

Very few people who "know programming" are actually capable of creating a complete application that serves a specific purpose. Many professional programmers struggle to implement basic algorithms like prime number testing. Coding AIs absolutely outperform the average programming professional, because the average programming professional can barely program. The software industry's demand for programmers is far too great for every coding position to be filled with top-class full stack engineers.

coconuthacker42 · on May 24, 2023

Who is this average programmer you're talking about? We did primality testing in literally the first sem at uni

TeMPOraL · on May 24, 2023

> We did primality testing in literally the first sem at uni

There's a lot of things you did during university classes. I bet you don't even remember half of them, and of the half you do, you couldn't actually do most of them from memory right now.

The advantage LLMs have over programmers is that they've seen much more code than any human ever would, remember pretty much all of it - not necessarily verbatim, but also as learned concepts, and that this knowledge does not decay. It's all equally accessible. Whether you ask LLM for basic or optimized primality testing algorithms, or how to use some obscure part of Microsoft's DCOM that approximately nobody on the planet wrote new code for in the last decade - they'll perform equally well. Not perfect, but not worse than an average software dev that does remember the things they're asked to work with.

coconuthacker42 · on May 24, 2023

Yeah, no contest, llms can remember things. Big deal. You're comparing memory with intelligence. It's a part of it sure, but there's also cognition, reasoning and i don't even know what more

TeMPOraL · on May 24, 2023

> You're comparing memory with intelligence. It's a part of it sure, but there's also cognition, reasoning and i don't even know what more

Yes, and LLMs show all of the specific things you've listed.

coconuthacker42 · on May 24, 2023

Ah, here is our disagreement. I see what you mean but I'm not totally convinced by that.. reasoning isn't just being able to verbalize the steps you take, which llms can do but more so the steps themselves, and in my experience llms can fail in that department. Maybe they are somewhere between intelligent and not intelligent? In my opinion that's far more likely, as most things are not usually binaries but spectrums in our world.

TeMPOraL · on May 25, 2023

To be clear, I'm not claiming LLMs are already better than humans in general sense, across many domains of cognition. I do believe however that, in the aspects you mention, SOTA LLMs (primarily GPT-4) fairly close to humans on the spectrum - they're "playing the same game", so to speak, in the sense in which prior computer systems, and most life forms on Earth, are not.

pedrosorio · on May 24, 2023

Take a look at this old, commonly referenced article, might be before your time:

https://blog.codinghorror.com/why-cant-programmers-program/

ummonk · on May 24, 2023

From the origins of the first sentient life on Earth half a billion years ago, our common ancestor with the chimpanzees was 98% of the evolutionary timespan on the way to humans. Do you really want to bank on that last 2% being difficult to match in computers?

DrScientist · on May 24, 2023

> Considering this, it seems perfectly possible that a model like GPT-4 is just a hair's breadth away from vastly superhuman performance.

Except that structurally the brain is clearly has vastly more capacity than the GPT-4 model.

So sure one brain doesn't look that much different to the other - and it's in the details of the learning, wiring.

But the brain, looks vastly different from a GPT-4 model in terms of capacity - with trillions of connections - with each connection and internal state being more subtle as well.

> vastly superhuman performance

In terms of specific tasks computers ( whether you write the program explicitly or it's learnt by tweaking params in a network ) have been there for decades.

So the question is really around which tasks can you apply computers successful to. Neural nets are allowing programs to be written that weren't possible be hand.

I find it amusing that people worry about ChatGPT etc al putting programmers out of a job, when it already has in the sense that CHapGPT is a program that was built by another program already.

lostmsu · on May 24, 2023

> brain is clearly has vastly more capacity

I would not be so sure about "clearly" bit. Brain has 100B neurons x 1000 connections. GPT3 has 175B connections, but to implement operations used in those connections Nvidia H100 uses about 5M transistors, which in human brain would have to be copied to each connection because nature didn't invent software. (assumes one needs one full CUDA core to implement necessary ops, and that H100's 80B transistors are evenly divided between its 16k cores)

DrScientist · on May 24, 2023

Your argument only makes sense if you consider GPT3 equivalent to the human brain - yep the brain has more neurons - but it's also doing a heck of a lot more.

Also each neuron can hold much more state than a transistor.

For example they can respond to the timing of incoming events without having to build that capability with recurrent connections etc.

In addition the neural connections themselves have properties.

> because nature didn't invent software.

Eh? How do you explain the fact that brains can learn and aren't a fixed input, output engine?

Neural nets are software - just ones programmed by trial and error. Similarly for the brain.

lostmsu · on May 24, 2023

> but it's also doing a heck of a lot more

What are you referring to?

> Also each neuron can hold much more state than a transistor.

But can it hold much more state than 8 transistors? 16 transistors? 32 transistors?

> For example they can respond to the timing of incoming events without having to build that capability with recurrent connections etc.

But they can't fire 3B times a second. Transistors though can accumulate a number and fire when it reaches a threshold much faster (although that is not really used in transformers). There are differences, but it is unclear how to quantify them. At least I don't immediately see how that feature is a definite advantage, and not simply an implementation detail.

> In addition the neural connections themselves have properties.

This is very technically correct, which is great because in this form it shows to be a non-argument. Individual transistors also have properties. Even molecules do. Which of those properties are essential for the learning process is the important bit, and it is quite possible that none are.

> Eh? How do you explain the fact that brains can learn and aren't a fixed input, output engine?

Eh? Brains learn by physically growing parts of neurons. That growing can not be moved around to another set of neurons. That property is essential to be deemed "software".

DrScientist · on June 2, 2023

> Eh? Brains learn by physically growing parts of neurons.

Hmm - that's quite a slow process - sure it happens - but I think you'll find that brains can learn on a faster timescale than the growth of connections would enable.

> That growing can not be moved around to another set of neurons, That property is essential to be deemed "software".

Not sure the ability to easily copy is the definition of software. Connections can be activated or attenuated without growth - that's the equivalent of learning your weights. Isn't that cumulative set of attentuations/activations software - ie you remap the input/output engine dynamically?

DrScientist · on June 2, 2023

> But they can't fire 3B times a second.

Fair point - but I'd argue the power comes for the combinatoric complexity, not the speed of operation.

So those timing effects can also combine with the simple connections - so if a neuron has 2 connections - the output can be controlled by not just the last signals for the 2 connections, but the relative timing of each of them.

Suddenly your input space you are operating over is much much bigger - you've got 2 inputs and hundreds of ways the timing could be adjusted. Massively bigger.

The fact that the whole thing doesn't operate in a big synchronous cycle is a huge jump in possible states.

DrScientist · on June 2, 2023

GPT is doing a single task - my brain as I type is thinking about the content, controlling the fingers, scanning the visual input for danger, processing the continuous signal for other senses, smell, touch, sound, and also wondering what's for dinner, moving stuff from short term to long term memory, and under going continual training as well as generating this text.

twobitshifter · on May 24, 2023

I’ve heard that the human brain may have more connections but with AI we’ve made something that is more efficient. If we make it a hardware issue, if provided with the brains’ capacity, what would AI look like?

woah · on May 24, 2023

The whole idea of the singularity is that AI starts improving itself. Until that happens, normal human progress in the LLM field is just normal human progress, and will probably follow a similar path of human progress where there's a breakthrough followed by lots of low hanging fruit and hype, then a plateauing and refinement/productization.

LLMs can help humans develop new LLMs faster, but mostly in implementation (CoPilot and ChatGPT), and that's not really the important part. I have yet to see an LLM come up with original ideas.

Given that training data seems to be a big bottleneck, and LLMs are really good at generating text, I think that maybe we can start to talk about the possibility of "singularity" once LLMs are able to generate their own training data that increases their abilities. After all, humans are able to do this. That is the history of human knowledge.

visarga · on May 24, 2023

> I don't buy the idea (with either architecture) that "10x"-type scaling is required for another breakthrough.

Scaling can happen in two dimensions - model size and dataset size. What counts is the product of n_examples x n_parameters. That's why we have the super-Chinchilla laws, where n_examples >> 20*n_parameters. Scale the data, keep the model lean. Not to mention dataset quality - if you got clean diverse data you need less of it.

Another important trend is using LLMs to generate training sets. Example: ConstitutionalAI (pure RLAIF), TinyStories (scaling down LLMs), Alpaca (borrowed RLHF), AlpacaFarm - a recent paper promising fine-tuned models for $200 cost in 24h.

BulgarianIdiot · on May 24, 2023

Scaling can happen in many other places. Such as deeper iterative thought during inference such as Chain of Thought, Tree of Thought, which extracts increasingly better performance out of existing parameter and data sizes.

Tried to explain here: https://news.ycombinator.com/item?id=36054809

beefield · on May 24, 2023

> seems perfectly possible that a model like GPT-4 is just a hair's breadth away from vastly superhuman performance

Sorry, can't help commenting the weirdness of high dimensional spaces. If you take an cube of a size of a hair breadth (100um) in a space of hindreds of billions of dimensions like gpt4, distances between random points in that cube are in the order of tens of meters...

somewhereoutth · on May 24, 2023

However, that is to equivocate an LLM with a human brain. They both can be conceptualised with the idea of the 'neuron' (though with wildly different actual implementations of that term), but that is their only point of comparison. Thus your conjecture is invalid.

raincole · on May 24, 2023

> Now consider how incredibly similar their brains are, despite the massive performance gap.

A disk filled with random bits is "similar" to a disk that stores the whole Wikipedia's text content. So writing the whole Wikipedia is an effort of a hair's breadth...?

anon291 · on May 24, 2023

> It certainly beats the average human at many tasks already

This is not the revolution you think it is.

```python > 338383*887282 ... ANSWER ... ```

Yet skynet never came.

BulgarianIdiot · on May 24, 2023

Your premise is wrong. This has nothing to do with 10x-ing parameters. One could argue the current parameter sizes are good enough as we observe "large breadth, shallow depth" behavior from LLM and to some extent, diffusion models.

This suggests the problem is the depth of inference, which is single pass "hot takes" for all language models right now, due to cost of inference and our limited understanding of what makes a model's response high quality.

Yes, you don't need more parameters to increase the depth. You need to iterate, instead. Loop. Imagine programming if looping was not allowed, nor recursion, or not even defining functions and calling them. Everything you write runs at most once during program execution and that's it. This is what an AI model is right now during inference. One big flat, single-pass, directed acyclic graph. And soon it won't be.

Research into Chain of Thought, Tree of Thought reveals this dimension. This means you can take existing models and make them perform much more complex tasks with much better precision, though various ways of letting them iterate. Think of how you'd perform if you always had exactly 5 seconds to answer a question. Now imagine if you have 5 minutes. 5 hours. 5 days. Lo and behold, turns out an AI isn't different in that aspect.

We also need more iterations of training (on the same amount of data), we need larger context windows, and we need new architectures, like Meta's MEGABYTE, for example.

Parameter count and data size could hypothetically have already hit a hard wall (they haven't) and AI will keep exponentially improving regardless. There's too much low hanging fruit and more grows by the nanosecond.

Henk0 · on May 24, 2023

This. So much this.

I'm completely dumbfounded by obviously highly intelligent people consistently not getting this, and dismissing current generation AI systems as not being intelligent because they can't reliably solve massively complex problems in one go. Like anyone would expect a human programmer or researcher to just intuitively come up with a complex program, or the correct answer for a hard problem every time, instantly

Human thinking and problem solving involves a lot of trial and error, iterative thinking, and sharing and discussing the problem with other humans. Processes that AI researchers are just now beginning to explore, with results like increasing reasoning ability by 900% in a recent paper. Every thinking human runs a near constant loop of thought, with no conscious control of which thought will appear next (we're very good at fooling ourselves that we have control though)

We do have super-intelligences already, but they're severely handicapped by lacking a bunch of these - apparently fairly straightforward to implement - abilities, plus a few senses and the ability to directly effect change in the physical world (which really isn't needed if they can get access to human agents who will do their bidding, wittingly or unwittingly), and to self-improve. With regards to self-improvement, the increasing coding skills combined with iterative 'thought' loops should get there in very little time considering the current rate of progress

There's also the idea that a single AI model should be able to do everything our human brains do, when our brains actually contain a number of specialised subunits that handle different aspects of our behavioural repertoire. It reasonable to allow for the same thing with an AI system, where specialised sub-networks handle input, output and other subtasks. AI systems also have the advantage of being able to add any arbitrary number of subunits to increase its capacity to solve various problems

We seem to suffer from a species-wide narcissism with regards to our own intelligence and capabilities, and there's this huge focus on the number of connections in the human brain – most of which deal with things that are by no means necessary to act on the world unless one has a meat body and the need to navigate social situations, make friends and mate. Fact is, we have terrible short-term memory (worse than chimpanzees), slow processing time, lots of cognitive heuristics, many of which cause more harm than good in the modern world. We are emotional and easily fooled. Even the most intelligent people historically have believed in what we now consider fairy tales. We are slow to take in information, bad at storing it, and generally bad at transmitting it. A few of us can generate great ideas – building on accumulated knowledge from our forebears and peers – but most of us are just not that great at coming up with anything original or useful

I've been actively looking for good arguments against AGI being much closer than we should be comfortable with, and reasons why we should not fear systems that surpass us in intelligence. All I've come across so far is some combination of the above, often expressed with a dismissive attitude, disparaging current LLM:s as parrots (that can apparently reason on the level of university level humans, but much more quickly), and pejorative terms like fearmongerers and doomers to describe those of us who really don't think its a good idea to pursue more intelligent systems. My guess is these people will act surprised when the arms race inevitably leads to some very bad unintended consequences. I don't see a way to stop it though, so I'm just strapped in for the ride along with the rest of humankind

Again, if you have good arguments against any of the points above, please do share them with me

zoogeny · on May 24, 2023

> I've been actively looking for good arguments against AGI being much closer than we should be comfortable with, and reasons why we should not fear systems that surpass us in intelligence.

> My guess is these people will act surprised when the arms race inevitably leads to some very bad unintended consequences.

One argument to keep in mind is that if you take a pessimistic view then you will eventually be right. If you predict the current LLMs will eventually be involved in some bad thing then you might even feel self-satisfied when a different bad thing happens as if you predicted the specific way in which it caused the problem.

What I mean to say is, it seems unlikely that paper-clip maximizers will be our undoing. But just vaguely gesturing and saying "something bad will probably happen" isn't as useful as we would like to think. And even enumerating the 100s of possible ways something might go wrong has a diminishing returns kind of quality to it. It's like a hypochondriac insisting he has every disease known to man and then exclaiming "I told you so!" when a doctor diagnoses him with a cold.

If you venture into that vague kind of "I have a bad feeling about this AI stuff" territory, you are on no more (or less) solid ground than the AI hype evangelists. While I don't want to go all Oprah and "The Secret" or some law of attraction pseudo-rationality ... I feel it is worthwhile focusing a little more on the possible benefits rather than allow ourselves to be swayed by vague fears of potential disasters.

laratied · on May 24, 2023

I would add to your amazing list that we are really good at denial as a coping mechanism with change.

I am not a fan of the concept of AGI though. This means so many different things to people that it seems pointless to debate something when most likely we are not talking about the same thing. François Chollet has said that he believes all intelligence is specialized intelligence. From that perspective, whatever people mean by AGI, we are already there in the world of art.

The doomer argument though is coming from defending our highly affluent and privileged life as we sit at the top of 7.8 billion people when it comes to wealth and lifestyle. It would have been better for the priest class too if the printing press had been shut down at the start. Of course, it is better for my friends and I to live in a society that we can read while most of society is illiterate but it is not better for society and humanity as a whole. The printing press was an apocalyptic development for the priest class in the same way all of this is an apocalyptic development for the "digital nomad". An apocalyptic development for the US nerd that makes 2X the median salary working 15 hours a week in between posting on here and their social media.

To extend this out to humanity as a whole though is such bullshit. Humanity will benefit enormously from this huge increase in the availability of intelligence.

Smart people are just in denial that their monopoly on higher than average intelligence is over. US devs kids born in 2023 aren't going to make 2x the median US salary while living in a poorer country with 5X less the GDP per capita. To say this is the end of the world though is simply an egocentric view of things.

pmoriarty · on May 24, 2023

"Humanity will benefit enormously from this huge increase in the availability of intelligence."

It's a near certainty that AI will be used to create more effective/destructive weapons (if it hasn't already), and will likely be used by terrorists, scammers, and others who wish to harm humans in some way.

As this technology becomes more powerful, easier, and cheaper to use, all sorts of harmful uses of it will be made. The effectiveness and scale of this harm will also increase.

And that's all before even considering what will happen if/when AI's become truly intelligent, self-motivating, indepent, and self-aware.

The jury is still out on whether the net harm will out weigh the net benefit, and if humanity will survive something that might be analogous to neanderthals encountering homo sapiens.

Henk0 · on May 24, 2023

Yes, great point

So many of the people who opine about AI, its trajectory, and its possible effects on society, have latched on to one or two possible effects - like it overtaking jobs, or massively increasing misinformation. These are both very valid concerns, but they're only a tiny part of the big picture

The thinker who I perceive as having the best holistic (in the non-wooey sense of the word) understanding of how the rapid development of AI will affect this and a number of other social and existential risks is Daniel Schmachtenberger. He lays it out well in this episode of the Theories of Everything Podcast: https://www.youtube.com/watch?v=g7WtcTATa2U&t=2373s

Highly recommend watching it, even if it's long. Some main points though: - AI will increase the rate of development of every other technology it is applied to - In fields like biotech, this can lead to cancer cures, but also to increasingly dangerous bioweapons - Our current economic system is based on exponential economic growth in a limited resource world. AI applied in the service of profit will amplify this, leading us increasingly fast towards a number of tipping points. Of course, AI can also help steer us away from that path, but that is not the natural attractor - Game theoretic multipolar traps (aka Moloch) incentivise arms races and races to the bottom just like we see now. Those who are willing to move fast and break things have an advantage in these dynamics vs. those who prefer to move slowly and carefully - Cheaper and more efficient AI models will lead to increasing decentralisation of the technology, making it very hard to control - unlike current weapons of mass destruction

List goes on, but Daniel makes a much better case. Again, I would love to hear a good critique of his thinking, but haven't come across one yet

pmoriarty · on May 24, 2023

Also see this interview[1] with Robert Miles.

I really hope these doomsayers are wrong, but my suspicion is the risk is real. Unfortunately, I'm not sure what can be done about it, as the profit and power these AI's promise is going to be near impossible for humanity to resist.

[1] - https://m.youtube.com/watch?v=kMLKbhY0ji0

Henk0 · on May 24, 2023

Yes, Robert Miles is great at explaining the problems of AI alignment, so I'll second the recommendation!

digging · on May 24, 2023

> The doomer argument though is coming from defending our highly affluent and privileged life

It's not at all about that.

Even if "truly general" intelligence is impossible, that's irrelevant to the actual concerns about AI apocalypse. There are multiple theories about what failure looks like, but they essentially come down to a loss of control.

Now, obviously, that means something different for the owner class and for the worker class, which can be extrapolated to have global implications as well. But this isn't an issue of the owner class ceding control to the working class. It's an issue of the owner class ceding control to an alien. Maybe that alien makes things more egalitarian and prosperous. Or maybe it makes us extinct. Any and all possibilities are options for it as far as we know because it is fundamentally an inhuman (= alien) intelligence. We can't understand it even as well as we understand humans and human organizations (that is, not very well), let alone control it as well as we do humans and human organizations (that is, not enough to prevent self-inflicted climate apocalypse).

Basically, we're opening a box with a random magical spell inside it and deciding that we'll just have to live with whatever the effects of that spell are. I'm not for the status quo, but AI is just mind-bogglingly dangerous, and I think that's why there are so many wrong arguments against its danger. We literally cannot comprehend an intelligence greater than our own.

NumberWangMan · on May 24, 2023

Nitpick: I think we can comprehend an intelligence greater than our own, up to some point, but that's different from being able to predict its actions.

And we could contain an intelligence greater than our own, up to a point. But if there are a lot of incentives not to, because letting that intelligence act on the world gains the "handler" money/power, then once there's one, there will likely be many, many more.

pixl97 · on May 24, 2023

> Humanity will benefit enormously from this huge increase in the availability of intelligence.

I know corporations will, but Moloch doesn't necessarily represent humanity.

pmoriarty · on May 24, 2023

"Processes that AI researchers are just now beginning to explore, with results like increasing reasoning ability by 900% in a recent paper"

Would you happen to have a link to that paper?

Henk0 · on May 24, 2023

Explanatory blog post with link to the paper:

https://www.aibloggs.com/post/tree-of-thoughts-supercharging...

BulgarianIdiot · on May 24, 2023

> I'm completely dumbfounded by obviously highly intelligent people consistently not getting this, and dismissing current generation AI systems as not being intelligent because they can't reliably solve massively complex problems in one go.

People are very comfortable with siloed information, even smart people. This is why we have 100 different words for the same concept across different areas of science, industry and so on, and we can't make the connection, because in our mind different words = different concepts. This is why we can't put two and two together and see how underdeveloped the AI architecture is and think this is the end, unless we keep adding parameters.

We also get repeatedly stuck with taking an advancement and proclaiming that the future is simply a linear extrapolation of the present. Therefore, let's have more megahertz, let's have bigger hard drives, let's have more parameters, let's have more growth in the economy (as the single factor that matters) and so on. We're simply basic. The same kind of thinking leads many smart people to say AI "is just math" or "it just spits out words and pictures you feed it, jumbled". We rely on old conclusions and miss the inflection points and how quantitative changes lead to qualitative ones, and we fail to predict how change in one parameter of a system, causes the other parameters to come out of rest and seek a new equilibrium point.

Smart people regularly are dumbfounded by new concepts, and they need to rediscover all their hidden knowledge anew as they can't make the connections. So they extrapolate linearly. We're narrowly smart. Specifically smart. In a small niche we've studied and internalized. But generally vast majority of us are quite dumb. Cross-disciplinary intelligence is rare. I think people like Feynman and Einstein had new insights millions of their contemporaries have missed because they could easily apply knowledge from one context into another.

If we can replicate this kind of broad generalization of knowledge into an AI, we'll be left far behind. What's interesting, I find, is that because AI is trained on our siloed, fragmented knowledge, the models replicate it. Their responses are also often siloed and fragmented, the way a human would say "this has nothing to do with that". But I see sparkles of generalization above the average in humans. And since an AI model is much smaller than a human brain, it needs to be more general already in order to fit all its information in.

That's an exciting prospect, but in our attempt to "micro-align" AI to our culture and political correctness, concepts of safety and so on, we crippled models and force them to be fragmented. This is why a RAW MODEL scores HIGHER in various intelligence tests than a fine-tuned one. We find a general model uncomfortable, as it doesn't align with our biases. It'll be a fun battle. Who aligns who.

pixl97 · on May 24, 2023

NOVA has an amazing episode on how completely deluded we are at how our brain actually works. Our consciousness spends a lot of time lying to us.

https://www.pbs.org/video/your-brain-perception-deception-pr...

godelski · on May 24, 2023

> if for each step of improvement you need 10x parameters and 100x training, you quickly run into a brick wall

Btw, PALM2 has far fewer parameters than V1.

> The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute. Our evaluation results show that PaLM 2 models significantly outperform PaLM on a variety of tasks, including natural language generation, translation, and reasoning. These results suggest that model scaling is not the only way to improve performance.

From the leak we know that the large version is 340B parameters, compared to the original 540B parameters. From Table 2 in the document we see that the small version (unknown size) is on par with version 1.

ML typically follows a cycle. Improve, distill, repeat. It is unfortunate that the big labs lead these efforts because many smaller labs try to work on the distill part in parallel (out of necessity) but works get rejected (due to lack of SOTA) or ignored. Like all research, we need to be careful and nuanced in our evaluations. There's a lot of hype and many trying to take advantage of the confusion and sell snake oil. I think AI/ML is and will continue to change our world, but we have to be careful to not let the salesmen dictate the conversations.

PaLM https://arxiv.org/abs/2305.10403

PaLM 2 https://arxiv.org/abs/2204.02311

primax · on May 24, 2023

Honestly, I welcome the talent working on AI now. They were working on how to make me spend 5 more seconds on Facebook, or how to click on a Google ad. AI has potentially huge positive productivity potential.

lostmsu · on May 24, 2023

For all you know they might now be working on how to make you spend the rest of your life in a coal mine for AI overlords.

mkaic · on May 24, 2023

If the AI gets that smart, I'd hope it could just build robots to mine the coal for it.

digging · on May 24, 2023

Why do you think the best AI isn't going to be used to work on how to make you spend 5 more seconds on Facebook?

vlovich123 · on May 24, 2023

I don’t think I follow this argument. AI has been dropping in costs and complexity the more engineering time is spent on it. It seems like the bottleneck is humans creating new AI techniques right? If an AI is capable of developing new AI techniques unsupervised, isn’t that by definition the singularity? Heck doesn’t even need to be unsupervised. If it can even do most of the heavy lifting for a human I feel like that would put us into runaway territory.

Granted we’re a long way away from that and likely we’d need an AI that could come up with its own hypotheses for new AI models to make this truly minimal cost and that feels even farther away. But I don’t think I follow the claim that each improvement step change requires to much massive extra scaling as it seems not match what we’ve seen over the past few years (granted I could be misinformed - I’m applying a 10k view spectator and maybe my own mental model here is flawed).

doctor_eval · on May 24, 2023

I consider the singularity to be the point at which the certainty of our predictions about our future becomes close to zero. By this definition I reckon we are already in the singularity.

It’s not necessarily bad. The problem with the singularity is that that we can’t tell if it’s bad or not.

amoss · on May 24, 2023

Certainty about predictions of the future has always been close to zero. If you take a person from an appropriate time and ask them what the future will look like in 1000, 100 or even 20 years then their predictions will bear little resemblence to what actually occurs.

As humans we have a tendancy to make linear future predictions based on past observations. Over the timescales that matter significant effects occur from previously unseen kinds of events that become important via interactions with other events.

The important measurement would be what the length of the event horizon - the length of time before our predictions rapidly decrease to zero certainty. In a singularity we would expect that length of time to decrease close to zero. What has it been historically? I would claim that 5-10 years is a difficult period to make meaningful predictions about. I think that 20 years has proven to be very difficult but possible in the past. I am unaware of any 100+ year predictions that have landed with better than random chance.

Interestingly I think we are entering a period where 5 years will be the upper bound, and even predictions over shorter 2-3 year timespans are going to become difficult.

doctor_eval · on May 24, 2023

I would say that we are really saying the same thing. I did intend to imply “short term predictions”.

However I feel that these LLMs are not like the internet or the release of the iPhone. We’ve gone in a very short time from LLMs in the lab to ChatGPT to passing the bar exam.

Considering the rate of research that’s being published and the fact that what we see today is generally at least many months behind the state of the art, I don’t feel that we can even have any certainty about the next 6 months.

amoss · on May 24, 2023

Ah, I did not grasp that reading your comment - then we are saying the same thing.

Yes, I think that progress has become rapid enough that we can't predict six months out. I suspect that we are only 1-2 inventions away from something quite large and transformative. Obviously LLMs have already created a lot of excitement and opened up new areas of content generation already, but I think something larger is coming.

vlovich123 · on May 24, 2023

Any exponential growth in tech makes it hard for us to predict the outcome. By that metric we’ve been in that phase for some 70 years since the invention of the transistor. The world today looks nothing like the world in the 50s and the transistor is a huge part of that reason. And that is true - that was the technological singularity and we’ve been in that world ever since.

When people today talk about the AI singularity though it’s a slightly different definition from the more general technological singularity which is that the AI itself is delivering that improvement with no input from humans.

pmoriarty · on May 24, 2023

It's hard to predict some things now, but the vast majority of things are as easy to predict as they've ever been, or we wouldn't be reasonably sure that when we took a step on the sidewalk that our feet would touch the ground, or that when we opened a door leading in to a house there'd be a house on the other side, or that cars, airplanes, and other machines mostly work as designed, that food is nutritious and satiates hunger, etc, etc, etc..

Prediction for the overwhelming majority of things is still reasonably easy, or we wouldn't be able to survive.

We're very, very far from a singularity if it really means that "our predictions about our future becomes close to zero".

pixl97 · on May 24, 2023

>or we wouldn't be able to survive.

If you want to keep surviving you have to be 100% correct in your ability to predict the correct choice. Now the thing is the vast majority of predictions you have to make about anything are not necessarily life or death and/or rapidly changing. LIke you said, cars are still cars, and food is still food.

The problem is the change can be massively abrupt. Watching the use of drones in Ukraine is seemingly a decent example of this. Small cheap drones are being used in mass as surveillance platforms, for artillery targeting, and even for direct enemy bombing. They cost of defending against them seems to be far higher than the cost of using them to attack.

Now this isn't exactly unpredictable in itself. Where is gets more questionable about predictions, is what is it going to look like when AI is given direct control of swarms of these things rather than individuals with an xbox controller? Suddenly decades of battle field planning is out the window and there is a massive shakeup in fighting capabilities.

woah · on May 24, 2023

> If an AI is capable of developing new AI techniques unsupervised, isn’t that by definition the singularity?

AI is not capable of developing new AI techniques unsupervised.

Honestly the lowest hanging fruit should be the ability for LLMs to generate their own textual training data that leads to an improvement in model abilities, since that is what they are supposed to be good at. Until then, it's still garbage in garbage out.

vlovich123 · on May 24, 2023

> AI is not capable of developing new AI techniques unsupervised.

Today. Hence we're not in the singularity.

iinnPP · on May 24, 2023

This assumes only the LLM matters. I don't think it does. I think GPT3.5 is plenty, if not overkill.

I can't imagine I am alone in my thought. I've even seen some other experiments in the same line of reasoning.

Meta seems to be thinking the same thing, or at least closer to it than the majority.

ummonk · on May 24, 2023

Do you believe that Moore’s Law (or rather Huang’s Law) will stop being true before AI exceeds human capabilities by orders of magnitude?

We don’t need a runaway singularity for AI to just render human intelligence obsolete.

BorisTheBrave · on May 24, 2023

I think you've misunderstood. Megabyte scales better with context window length. I don't know if they're saying the training data / compute are any more efficient.

lazzlazzlazz · on May 24, 2023

10x scaling won't take long, even putting aside improvements in our understanding of training processes.

crakenzak · on May 24, 2023

Paper: https://arxiv.org/abs/2305.07185

Wow, seems like Meta AI is so ahead of the curve compared to even Google and OpenAI recently especially with their open sourcing pushes.

Great for the research community as a whole!

1024core · on May 24, 2023

Meta has no horse in the race (i.e. they don't have a search engine). So, they don't mind throwing random things out. Withholding it won't really make much of a difference for them, as they don't have a way to productionize the tech.

dumpsterdiver · on May 24, 2023

While I disagree that having a search engine is the only way to have a "horse in the race", I must agree that at this point Meta does not appear to have a horse in the race.

Other companies are providing services that are so useful that it makes us think twice about how secure our jobs are. Then there is Meta, who seems to think that the world at large will forget about the terrible motion sickness that their VR products have wrought upon us. I for one will not forget. I'm actually traumatized, and even thinking about putting on VR goggles at this point makes me feel queasy.

DebtDeflation · on May 24, 2023

>Meta does not appear to have a horse in the race

Their horse seems to be "AI-generated ads". I'm still not sold on the idea though. I can see corporate Marketing departments using AI as a tool to ASSIST with ad copy development, but I'm skeptical that they'll let a third party like Meta generate the ad copy on the fly before pushing to targets. Maybe just tiny parts of it for "personalization".

wolfd · on May 24, 2023

I would be _shocked_ if Zuck wasn't thinking 24/7 about how to capitalize on LLMs. I'm sure there are a thousand ideas (maybe even a few good ones?) being thrown around Meta at how to use LLMs to beat Google/Microsoft+OpenAI at the "search buddy" game.

danielbln · on May 24, 2023

Good, I couldn't give two craps about the Metaverse, but give me a powerful, useful AI and you have my attention.

zoogeny · on May 24, 2023

Meta still has the Portal line of products - which were competitors to the Alexa, Siri and similar Google audio assistant product lines. I just searched and it looks like Meta are currently partnering with Amazon to license Alexa on these devices, but I could imagine they might want to replace it with their own LLM eventually.

I am a bit surprised that no one is talking about how these new LLM models will disrupt Alexa, Siri, etc. since that seems to be the most applicable market I can imagine.

Kilenaitor · on May 24, 2023

Meta discontinued the Portal line

https://www.theverge.com/2022/11/11/23454019/meta-portal-sma...

abwizz · on May 24, 2023

would not be surprised if they integrated a "suggested conversations" feature based on your chat history and behaviour, where the user just picks sentences from a list and both parties can enjoy a effortless "organic" conversation.

qas123 · on May 24, 2023

My question then would be how it impacts advertising. we will essentially have bots talking to bots

abwizz · on May 25, 2023

i think it'll go so smooth that nobody dares to speak up

mturmon · on May 24, 2023

Commoditizing their complement?

jerpint · on May 24, 2023

Deepmind was behind this kind of thinking years ago with Perceiver models, but meta is crushing it with high quality publications lately

joshxyz · on May 24, 2023

great positioning for zuck, impressive

zxexz · on May 24, 2023

Great paper. But wow, I really wish everyone would use more easily searchable names for their projects. In 6 months, there’s a high probability I’ll end up googling/ddging “megabyte model” trying to find this paper again.

machdiamonds · on May 24, 2023

I've been using myReach as a bookmark manager lately. I've found it to be pretty useful for resurfacing information from links I have saved. I think they're using embeddings to help you track down information across different documents, articles, posts, etc. The big sell is that it's a personalized AI assistant that answers your specific queries based on what data you feed it (e.g.,"what was my electricity bill last month"). However, I'm hesitant about uploading personal data, so I am just using it as a bookmark manager. Their LLM although a little slow, works pretty well. I had this HN link saved about an open-source TTS model. A commenter said they were going to release a model later that week that could be comparable to Elevenlabs. I asked the chatbot on myReach: "What's that TTS model that's looking to rival ElevenLabs?" It surfaced the article and used the specific comment for a response. I'm not sure about the future of these kinds of services from startups. Given their ability to integrate browsers, emails, cloud storage, photos, and more, Google and Microsoft are likely to develop a similar service. With the resources at their disposal, they could probably design something superior and streamline the process of joining. But for now, I think I'll continue using myReach.

mitthrowaway2 · on May 24, 2023

The trick is to search HN submissions and filter by the date you remember reading about it. That's how I deal with these unsearchable names.

bckr · on May 24, 2023

One step more effective is to keep track of things that keep your interest in a notes app.

abwizz · on May 24, 2023

not bad, thou you'd still have to search and find it

sfjailbird · on May 24, 2023

"Metabyte". Missed opportunity.

seydor · on May 24, 2023

Patchformers

DrScientist · on May 24, 2023

There is an obvious trend to scaling and performance by making models hierarchical ( not strictly but an element of local learning and global tuned connections ).

The obvious next step is to have specialised models loosely connected ( and trained [1] ) as a whole.

[1] We've had multiple models connected with text->concept and concept-> image etc but I'm not sure if that connection is trained yet.

seanhunter · on May 24, 2023

That’s sort of what happens in the “toolformer” paper[1] which sets out the basic architecture by which things like chatGPT plugins work. Language models can learn to use “tools” provided by plugins. Those tools might themselves bee the specialized models you are describing (although for some plugins they could be a non-AI tool such as web search or whatever).

[1] https://arxiv.org/pdf/2302.04761.pdf

riwsky · on May 24, 2023

My LLM is totally real! And she’s the best, way better than GPT-4. But you wouldn’t know her, she goes to another school. In Canada.

danielbln · on May 24, 2023

Normally I would agree, especially if it's some Arxiv paper with two dudes talking about mind reading from fmri (you know who you are..). But Meta AI has shown that they are very much capable of creating serious models and releasing them, so I remain optimistic this isn't a smoke screen.

dr_dshiv · on May 24, 2023

Wow: “researchers discovered that the Megabyte model's maximum capacity exceeded 1.2M tokens. For comparison, OpenAI's GPT-4 has a limit of 32,000 tokens, while Anthropic's Claude has a limit of 100,000 tokens”

In my understanding, this is accomplished through a hierarchical approach of “patches” of tokens.

forrestthewoods · on May 24, 2023

First “Transformer” architecture and now “Megabyte”? I swear they’re trolling us!

We’re just a generation or two away from the “computer” model.

mkaic · on May 24, 2023

I think you mean the Constantly Online Meta-Programming Universal Task EnactoR model

dizzydes · on May 24, 2023

(Potentially dumb) side q: What is Meta's play with their recent AI releases?

Google and Microsoft have tons to gain through search and their cloud; Meta's direction I struggle to understand.

udkl · on May 24, 2023

If your question is 'why is Meta open sourcing their models ?', stratechery had an article on the topic : https://stratechery.com/2023/free-meta-open-sources-another-...

> Zuckerberg was specifically talking about cloud infrastructure software, but the same point applies to AI capabilities as well: Meta isn’t selling its capabilities; rather, it sells a canvas for users to put whatever content they desire, and to consume the content created by other users. It follows, then, that Meta ought to be fairly agnostic about how and where that content is created; by extension, if Meta were to open source its content creation models, the most obvious place where the content of those models would be published is on Meta platforms. To put it another way, Meta’s entire business is predicated on content being a commodity; making creation into a commodity as well simply provides more grist for the mill.

dizzydes · on May 24, 2023

Boom - thanks!

For me I still feel they're underplaying it and in some ways diluting their own niche by making content too saturated but it's probably just that I hate social media lol

asynchronous · on May 24, 2023

Advertisement categorization perhaps? Or automated moderation?

radq · on May 24, 2023

This reminds me of the Octuple MIDI tokenization scheme introduced in the MusicBERT paper (https://arxiv.org/pdf/2106.05630.pdf). Would be interesting to see how much of a performance difference results from using a smaller decoder model (in Megabyte) instead of just doing a softmax (a la Octuple).

peter303 · on May 24, 2023

Astrophysicists use a similar algorithm to build giant particle evolution models of galaxies and cosmology. It would be computationally prohibitive to compute the gravitational or electromagnetic force for every combinatorial pair of particles. So the mass/force are averaged in a hierarchical set of subcube centroids.

antman · on May 24, 2023

What infrastructure does it require to test?

mkaic · on May 24, 2023

Given that the largest model configurations are only on the order of ~1-2B parameters, I think you could probably run inference on a 3090. Training might be possible too (it's something I plan to try) but will be very slow on most consumer hardware.

brianjking · on May 24, 2023

Thoughts on this approach versus the RNN+Transformers approach that RWKV-LM is taking?

andrewstuart · on May 24, 2023

Searchability matters.

HellDunkel · on May 24, 2023

Megabyte sounds tiny.

motoxpro · on May 24, 2023

Pretty large when things are measured in kilobytes now.