Zero memory inside the model from one input (ie token output) to the next (only the KV cache, which is just an optimization). The only "memory" is what the model outputs and therefore gets to re-consume (and even there it's an odd sort of memory since the model itself didn't exactly choose what to output - that's a random top-N sampling).
There is no real runtime learning - certainly no weight updates. The weights are all derived from pre-training, and so the runtime model just represents a frozen chunk of learning. Maybe you are thinking of "in-context learning", which doesn't update the weights, but is rather the ability of the model to use whatever is in the context, including having that "reinforced" by repetition. This is all a poor substitute for what an animal does - continuously learning from experience and exploration.
The "magic dust" in our brains, relative to LLMs, is just a more advanced and structure architecture, and operational dynamics. e.g. We've got the thalamo-cortical loop, massive amounts of top-down feedback for incremental learning from prediction failure, working memory, innate drives such as curiosity (prediction uncertainty) and boredom to drive exploration and learning, etc, etc. No magic, just architecture.
I'm not entirely sure what you're arguing for. Current AI models can still get a lot better, sure. I'm not in the AGI in 3 years camp.
But, people in this thread are making philosophically very poor points about why that is supposedly so.
It's not "just" sequence prediction, because sequence prediction is the very essence of what the human brain does.
Your points on learning and memory are similarly weak word play. Memory means holding some quantity constant over time in the internal state of a model. Learning means being able to update those quantities. LLMs obviously do both.
You're probably going to be thinking of all sorts of obvious ways in which LLMs and humans are different.
But no one's claiming there's an artificial human. What does exist is increasingly powerful data processing software that progressively encroaches on domains previously thought to be that of humans only.
And there may be all sorts of limitations to that, but those (sequences, learning, memory) aren't them.
> It's not "just" sequence prediction, because sequence prediction is the very essence of what the human brain does.
Agree wrt the brain.
Sure, LLMs are also sequence predictors, and this is a large part of why they appear intelligent (intelligence = learning + prediction). The other part is that they are trained to mimic their training data, which came from a system of greater intelligence than their own, so by mimicking a more intelligent system they appear to be punching above their weight.
I'm not sure that "JUST sequence predictors" is so inappropriate though - sure sequence prediction is a powerful and critical capability (the core of intelligence), but that is ALL that LLMs can do, so "just" is appropriate.
Of course additionally not all sequence predictors are of equal capability, so we can't even say, "well, at least as far as being sequence predictors goes, they are equal to humans", but that's a difficult comparison to make.
> Your points on learning and memory are similarly weak word play. Memory means holding some quantity constant over time in the internal state of a model. Learning means being able to update those quantities. LLMs obviously do both.
Well, no...
1) LLMs do NOT "hold some quantity constant over time in the internal state of the model". It is a pass-thru architecture with zero internal storage. When each token is generated it is appended to the input, and the updated input sequence is fed into the model and everything is calculated from scratch (other than the KV cache optimization). The model appears to be have internal memory due to the coherence of the sequence of tokens it is outputting, but in reality everything is recalculated from scratch, and the coherence is due to the fact that adding one token to the end of a sequence doesn't change the meaning of the sequence by much, and most of what is recalculated will therefore be the same as before.
2) If the model has learnt something, then it should have remembered it from one use to another, but LLMs don't do this. Once the context is gone and the user starts a new conversation/session, then all memory of the prior session is gone - the model has NOT updated itself to remember anything about what happened previously. If this was an employee (an AI coder, perhaps) then it would be perpetual groundhog day. Every day it came to work it'd be repeating the same mistakes it made the day before, and would have forgotten everything you might have taught it. This is not my definition of learning, and more to the point the lack of such incremental permanent learning is what'll make LLMs useless for very many jobs. It's not an easy fix, which is why we're stuck with massively expensive infrequent retrainings from scratch rather than incremental learning.
There is no real runtime learning - certainly no weight updates. The weights are all derived from pre-training, and so the runtime model just represents a frozen chunk of learning. Maybe you are thinking of "in-context learning", which doesn't update the weights, but is rather the ability of the model to use whatever is in the context, including having that "reinforced" by repetition. This is all a poor substitute for what an animal does - continuously learning from experience and exploration.
The "magic dust" in our brains, relative to LLMs, is just a more advanced and structure architecture, and operational dynamics. e.g. We've got the thalamo-cortical loop, massive amounts of top-down feedback for incremental learning from prediction failure, working memory, innate drives such as curiosity (prediction uncertainty) and boredom to drive exploration and learning, etc, etc. No magic, just architecture.