1. Platform threads place a heavier burden on the GC. It's true that virtual threads are allocated on the heap, but platform threads are GC roots, which is worse. The GC easily deals with a gazillion heap objects; it's rather unhappy with lots of roots. The number of heap objects that virtual threads occupy is roughly the same as the number of heap objects that async code allocates anyway.
2. The Jetty experiment measured the wrong thing as they misunderstood the origin of the "million thread" scenario. What happens in a real application is that you have some number of threads with deep stacks servicing incoming requests -- say 50K concurrent sessions -- and then each of those fans out to, say, 19 micro services in parallel, each of those outgoing requests is done on a virtual thread with a very small stack, and that's how you get to 1M threads. I.e. when you have a high number of threads, only a small minority of them (5% in this example) have a deep stack.
3. I don't think anyone would claim anything is a silver bullet. All virtual threads do is let a server service the same throughput as asynchronous code does, but the code is much simpler and it is observable, i.e. easily debuggable and profitable, something that async code can't do.
Regarding #1, would not the stacks of the lightweight threads have to root have to root any object on it? Otherwise the GC would free objects out from under the virtual thread, right?
I could imagine that by having fewer physical threads running, the stop-the-world part of garbage collection could suspend the runtime more quickly. That could reduce the effect of GC-pauses.
Virtual thread stacks reference the objects that local variables on the stack reference, but they are not themselves GC roots. GC roots are special objects that the GC starts its scan of the heap from, and they tend to be particularly costly, at least for most of OpenJDK's GCs. Virtual threads are just ordinary heap objects that can reference other objects.
> I could imagine that by having fewer physical threads running, the stop-the-world part of garbage collection could suspend the runtime more quickly. That could reduce the effect of GC-pauses.
Precisely. Although it's worth mentioning that while that's true for G1, ZGC does not stop-the-world when scanning roots, including platform thread stacks (https://openjdk.org/jeps/376).
As long as virtual threads retain an entrypoint of control flow (e.g. return point from an I/O call), they will also be GC roots. They might not be very deep but they are GC roots.
When a call returns, locals and parameters back up the stack will be expected to be live. Since there's no way in general to create a reference to a stack using JVM instructions (unlike .NET), the stack of every live thread must be a GC root.
If you want some more detail, when a virtual thread is in the runnable state, it is reachable from the scheduler (which itself is a Java object, and not a GC root); when it is blocked on a lock or IO, then the lock object or the IO mechanism must retain a reference to it, or there would be no way to unblock it. The thread object has a reference to the stack, which is a heap object (actually, it could be made up of several heap objects).
A thread that is not strongly reachable can provably no longer make progress -- it must be blocked but there's no way to unblock it -- and will be collected even if it has not terminated. It may live forever in our hearts, but not in the heap.
Interesting to read. It's a technical distinction with a primarily implementation difference, which I don't yet understand (i.e. have not taken the time to read yet), but I infer from the fragment that I did read, that there is some degree of semi-magical hoop jumping going on to make the CPU stack live in a Java heap object to which a reference can be taken in Java code.
Objects are obviously rooted for blocked virtual threads that may resume - a formal understanding of them being GC roots - but the implementation appears to be by taking a reference to the heap object containing the stack at the moment of being blocked, presumably by a JVM native method or similar.
> Objects are obviously rooted for blocked virtual threads that may resume
If by "rooted" you mean reachable in the object graph when starting the traversal from the roots, then yes. If a blocked thread isn't reachable, there is no way to call its unpark method that resumes it.
> the heap object containing the stack at the moment of being blocked, presumably by a JVM native method or similar.
Yes, we implemented virtual threads on top of continuations that, in turn, are implemented inside the VM. Their stacks are reified as heap objects.
They have to be scanned in every collection and G1 scans them in a stop-the-world pause. Other references may not be scanned at all in most collections (partial), and when they are, G1 scans them concurrently. They're less of a problem with ZGC.
2. The Jetty experiment measured the wrong thing as they misunderstood the origin of the "million thread" scenario. What happens in a real application is that you have some number of threads with deep stacks servicing incoming requests -- say 50K concurrent sessions -- and then each of those fans out to, say, 19 micro services in parallel, each of those outgoing requests is done on a virtual thread with a very small stack, and that's how you get to 1M threads. I.e. when you have a high number of threads, only a small minority of them (5% in this example) have a deep stack.
3. I don't think anyone would claim anything is a silver bullet. All virtual threads do is let a server service the same throughput as asynchronous code does, but the code is much simpler and it is observable, i.e. easily debuggable and profitable, something that async code can't do.