It's because you over generalized your simple understanding. There is a lot more nuance to that thing you are calling overfitting (and underfitting). We do not know why it happens or when it happens, in all cases. We do know cases where it does happen and why it happens, but that doesn't me we don't know others. There is still a lot of interpretation left that is needed. How much was overfit? How much underfit? Can these happen at the same time? (yes) What layers do this, what causes this, and how can we avoid it? Reading the article shows you that this is far from a trivial task. This is all before we even introduce the concept of sudden generalization. Once we do that then all these things start again but now under a completely different context that is even more surprising. We also need to talk about new aspects like the rate of generalization and rate of memorization what what affects these.
tldr: don't oversimplify things: you underfit
P.S. please don't fucking review. Your complaints aren't critiques.
tldr: don't oversimplify things: you underfit
P.S. please don't fucking review. Your complaints aren't critiques.