Typically ThreadLocals are used whenever something is costly to initialize per-request. The rule of thumb is usually: you might want to use some heavy object without locking, so you stick into a ThreadLocal.
However, if suddenly you have a million threads then this optimization doesn't work anymore. Sure, the concept of ThreadLocal still works, but in practice you'll end up creating a million of these heavy objects - something you wanted to avoid!
Expensive to initialize doesn’t imply large. And wanting to avoid expensive initialization (runtime) is orthogonal to wanting to avoid a larger memory footprint. So I don’t quite buy your argument.
Your question was why the advice was given to avoid ThreadLocals. This is the primary reason. It's not necessarily related to avoiding a larger memory footprint.
For what it worth, there actually is a separate JEP (I believe this: https://openjdk.org/jeps/429 ) for a new, scope-bases solution that promises much better performance.
What is the problem here? Just the per-virtual-thread memory consumption by the variable when it is used (which would be expected)?