Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why Garbage Collection Paranoia is Still (sometimes) Justified (dadgum.com)
42 points by signa11 on Nov 6, 2011 | hide | past | favorite | 18 comments


> What language systems in existence are using a true incremental or concurrent garbage collector? I know of three: Java, Objective C 2.0 (which just shipped with OS X Leopard), and the .net runtime. Not Haskell. Not Erlang. Not Objective Caml. [...], not Smalltalk not Ruby.

Oh, why could that possibly be. Hmm...

- Java: Sun

- Objective C: Apple

- .Net: Microsoft

Big companies with the manpower and determination to make improvements to their platforms. All the other languages have either grown out of one-man prototypes or academic projects. Building a good garbage collector is hard. It's difficult because you can't keep track of much data while you're allocating, you can't walk the graph of allocated objects because different threads are modifying the graph while you're working on it. When you then add a constraint that you can never stop the world for more than 5ms at the time it becomes a Difficult Problem.

Languages like Ruby, Python, Lisp and Scheme don't have good generational garbage collectors because they don't have the manpower to tackle Difficult Problems. (See for instance the global interpreter lock in Python. It's been 10 years now?)

To tackle a Difficult Problem you need (a) a few smart people and (b) get them to work on the problem for many consecutive hours for many consecutive days. Volunteers just can't afford to spend 50+ hours a week for half a year on a problem. A hundred people can chip in weekends and evenings for a year and make no substantial progress at all.

This is completely different from the hypothetical "sufficiently smart compiler" that is supposed to sense the intent of the programmer and optimize accordingly. A sufficiently smart compiler cannot compensate for bad algorithm design, so the performance differences will always be in the margins. The state-of-the-art in terms of garbage collection is decades ahead of what we see in Python, Ruby, Haskell, Scheme and so forth. The GC problem is simple: spread the computational overhead of garbage collection evenly so that the application doesn't stutter. Simple engineering problem with a concrete and clear goal. Just difficult to implement.


Languages like Ruby, Python, Lisp and Scheme don't have good generational garbage collectors because they don't have the manpower to tackle Difficult Problems.

It seems that you confused language with implementation. For example: want a Ruby without GIL and with great garbage collector? Use JRuby (and this assumes we are comparing to the official one: YARV). One language, different implementations.


JRuby just piggybacks on Java. That doesn't invalidate my point, as far as I can tell.


JRuby uses the JVM. Java is the language. Same point: language != implementation. Not always, but nowadays we have many mature vms, so it is more common.


Seems like languages which can't or don't want to target JVM or CLR might do well to at least take a cue from Sun and Microsoft, and pool resources. On both those platforms, there are scads of languages all sharing the same GC.

Which leads me to wonder - WTF ever happened to Parrot? I know the project is still alive, but why doesn't anyone seem to be using it?


Sadly, Parrot sucks. It doesn't have a stable bytecode, its assembly is buggy and too high-level, it only recently got a generational (but still m&s stop-the-world) GC and has no useful object system.

I've tried implementing Python3 on it for GSoC and failed.


PyPy is growing a nice concurrent GC (and a solution to the GIL). I believe Rubinius also has a work-in-progress concurrent GC.


In "Objective Caml. [...]" what you snipped out started with an "Oops, forget what I just said about Objective Caml."


OCaml's garbage collector doesn't deal with concurrency!


You quoted "incremental or concurrent".


Historical note: the Lisp Machines had true incremental collection, supported by microcode and hardware. Words in memory had a tag field, and a certain value of the tag field indicated that the word was "GC-forwarded". When the GC moved an object (it was a copying collector) it would leave behind these GC-forward pointers in each word of the old location of the object, pointing to the corresponding word of the new location. Subsequent references to one of the old words would automatically indirect to the new one -- note that this is an indirection triggered by the contents of memory, not by an explicit instruction. The point was, it was no longer necessary to find all references to the old location and update them just to move one object. (The GC algorithm guaranteed that all references to old objects would be updated eventually, so that the space could be actually reclaimed.)

It's hard to do this without hardware support, and even harder on a multiprocessor. Still, I do recall seeing a paper presenting a fully incremental GC with a very short maximum-pause guarantee. The problem was (IIRC) that it made such heavy use of CAS (compare-and-swap) instructions that the overhead of these would be unacceptable.

I wonder how much effort Intel and AMD have been putting into making CAS faster.


I remember that Intel made it faster in Nehalem.


Note that a heap implementation for manual allocation and manual freeing isn't free of problem cases either. The heap will impose an accumulating maintenance overhead that will need to be cleared out eventually. Similarly to a GC heap you can try to amortize the overhead into some controllable, bounded chunks of computation but that doesn't mean pathological cases wouldn't exist nor happen.


I don't think there's anyone credible out there who will say that there is never a case where garbage collection isn't appropriate. Sure, rocket guidance systems, military software, video games, and software from 1983 aren't a good fit for garbage collection.

However, in 2011 I think that the vast majority of software programmers write would benefit from garbage collection more than they would lose.


I don't think I would call O(n) of the amount of allocated data 'pathological'. Nor would I agree that completely incremental collectors being rare proves anything about what is possible. I would say that it's simply developers preferring a runtime that's a small linear multiple faster but has occasional pauses.


The V8 JavaScript implementation landed an incremental collector a couple months ago, FWIW.


I want to see the next post about the Erlang GC.


This post is from 2008, so the next post is already posted here http://prog21.dadgum.com/16.html?_




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: