The issue with destructors being slow is actually a well-known problem with C++, particularly on process shutdown when huge object graphs often end up being recursively destructed for no practical benefit whatsoever (since all they do is release OS resources that are going to be released by the OS itself when process exits).
Comparing stack deallocation vs GC is kinda weird because it's not an either-or - many GC languages will happily let you stack-allocate just the same (e.g. `struct` in C#) for the same performance profile. It's when you can't stack-allocate that the difference between deterministic memory management vs tracing GC become important.
Also, refcounting is not superior to GC in terms of speed, generally speaking, because GC (esp. compacting ones) can release multiple objects at once in the same manner as cleaning up the stack, with a single pointer op. Refcounting in a multithreaded environment additionally requires atomics, which aren't free, either. What refcounting gives you is predictability of deallocations, not raw speed. Which, to be fair, is often more important for perception of speed, as in e.g. UI where a sudden GC in the middle of a redraw would produce visible stutter.
> Also, refcounting is not superior to GC in terms of speed, generally speaking, because GC (esp. compacting ones) can release multiple objects at once in the same manner as cleaning up the stack, with a single pointer op. Refcounting in a multithreaded environment additionally requires atomics, which aren't free, either. What refcounting gives you is predictability of deallocations, not raw speed. Which, to be fair, is often more important for perception of speed, as in e.g. UI where a sudden GC in the middle of a redraw would produce visible stutter.
In practice, tail latencies are much harder to control in GC vs RC implementations which is what I was trying to communicate. This doesn’t matter just for UI applications but can also directly implicate how much load your server can service. Ref counting in a multithreaded environment can use atomics although biased ref counting is considered the state of the art to minimize that cost (ie RC on the owning thread, arc on shared threads).
As for releasing multiple objects at once, in practice I’ve yet to see that bear out in practice as a real advantage. The cost of walking the graph tends to dominate vs RC where you precisely release when unreferenced. And that’s assuming you even use RC - often times you at most RC at the outermost layer and everything internally is direct ownership. And if you really do need that, use an arena allocator which gives you that property without the need for a GC collection pause. There’s a reason there’s no systems language that uses GC.
> The issue with destructors being slow is actually a well-known problem with C++, particularly on process shutdown when huge object graphs often end up being recursively destructed for no practical benefit whatsoever (since all they do is release OS resources that are going to be released by the OS itself when process exits).
If you want fast shutdown just call _Exit(0) to bypass destructors of static, thread local, automatic storage duration. GC languages have a much worse problem of making it really easy to leak resources during the execution of a long running program. I’ll take that over a slow shutdown anytime, especially since in practice, unless you’ve written really bad code, that “slow shutdown” remains negligible.
> There’s a reason there’s no systems language that uses GC.
There are a few system languages that uses GC, like Nim and D. Of course with the option to do manual memory management where necessary, and allocating things on the stack whenever possible. Nim also gives option for several diferent types of GCs and memory allocators, where each one can be more performant for different tasks. Maximum GC pause can also be configurable, at the cost of temporarily using more memory than you should until the GC manages to catch up.
Of course, you can always manually craft arenas and such to be faster and avoid fragmentation, at the cost of much more effort.
Nim and D both offer multiple GC strategies within the language. Just as with C and Rust, while they can be used for systems programming, they can also be used for other things. If you’re doing systems level programming with them you’re probably not choosing any tracing GC option.
Nim and D are also bad examples as I’m not aware of any meaningful systems level programs that have been written in them - they have continuously failed to find a way to become mainstream (Nim is mildly more successful in that it’s managed to break into the 50-100 range of most popular languages but that’s already well into the tail of languages to the point where you can’t even tell the difference between 50 and 100)
Comparing stack deallocation vs GC is kinda weird because it's not an either-or - many GC languages will happily let you stack-allocate just the same (e.g. `struct` in C#) for the same performance profile. It's when you can't stack-allocate that the difference between deterministic memory management vs tracing GC become important.
Also, refcounting is not superior to GC in terms of speed, generally speaking, because GC (esp. compacting ones) can release multiple objects at once in the same manner as cleaning up the stack, with a single pointer op. Refcounting in a multithreaded environment additionally requires atomics, which aren't free, either. What refcounting gives you is predictability of deallocations, not raw speed. Which, to be fair, is often more important for perception of speed, as in e.g. UI where a sudden GC in the middle of a redraw would produce visible stutter.