Hacker Newsnew | past | comments | ask | show | jobs | submit | more ahamez's commentslogin

IIRC, AppleScript was once usable in languages other than English.


Whether it's actually _designed_ with those languages in mind is a different question. VBA (in)famously was also translated for non-English versions of Office, but that only involved replacing the keywords by translated ones in the other language. German VBA read _very_ weirdly to me, back then.


It stands for Architectural Decision Records. More info here: https://adr.github.io. We use this at my work to keep a record of our decisions, as the name implies ;-).


Could you elaborate? Why would refcount be a bad idea ? (genuine question)


Many think that refcount isn't a GC algorithm and that it is faster than GC.

Well, refcounting is the basic way of implementing GC and is listed as GC algorithm in any serious CS book about GC algorithms like "The Garbage Collection Handbook".

What many refer as GC is actually the GC algorithms that fall under the umbrella of tracing GC.

Then refcounting is only faster than GC in very constrained scenarios:

- no sharing across threads, otherwise locking is required for refcount updates

- no deep datastructures, otherwise the call stack of cascading deletions will produce a stop similar to tracing GC

- implementations need to be clever about nested deletions when refcount reaches 0, otherwise a stack overflow might happen

- cyclic datastructures need programmer help to break cycles via weak dependencies or are just not allowed

This is only relevant for naive refcount implementations, there are many optimisations, which endup turning a refcounting implementation into a tracing GC in disguise.

Also just because a language uses a tracing GC algorithm, it does not prevent the existence of value types, or manual memory allocation for hot code paths, thus allowing for more productive coding, while offering the tools for memory optimisation when needed.

This is not yet the case with Java, but even here it is part of the language's roadmap to fix this.


> - no sharing across threads, otherwise locking is required for refcount updates

That is not true. Most atomic refcount implementations are lockfree (you do need synchronization though). This optimization has nothing to do with tracing. Given that you also need synchronization for tracing garbage collection (including frequent stop the world pauses in most cases, albeit brief ones), I don't think this is even really an advantage of tracing at all.

> - no deep datastructures, otherwise the call stack of cascading deletions will produce a stop similar to tracing GC

If your program has data structures that live through several collections, refcounting can be faster than tracing GC because it only traces objects once (when they die) instead of several times. Additionally, you can defer the refcount updates in ways that avoid the need to do lots of work on deallocation, and optimize for short-lived objects, to get many of the effects of generational GC. This also has little to do with tracing (except inasmuch as a reference counter with this optimization has to "trace" new objects to make them initially live, but in some sense this work to update the reference counter for the first time is just moved from object initialization time to a later point in the program). The reason it's not usually done is that it requires precise liveness information by an ambient collector, which complicates the implementation, but "exploiting liveness information" is not the same as "being a tracing GC."

> - implementations need to be clever about nested deletions when refcount reaches 0, otherwise a stack overflow might happen

This is extremely trivial to avoid (if you need to) and has nothing to do with tracing. It's really more a product of user-defined destructors than anything else, which you don't need to provide in order to implement reference counting. In fact, a lot of the supposedly inevitable slowness of reference counting compared to tracing goes away when you ban nontrivial destructors (and conversely, nontrivial destructors make tracing perform much worse).

> This is only relevant for naive refcount implementations, there are many optimisations, which endup turning a refcounting implementation into a tracing GC in disguise.

The most optimized versions of refcounting I'm aware of get their wins from things like precise knowledge about live references and update coalescing--not tracing, except for some young object optimizations as I alluded to above which are more about satisfying the generational hypothesis (they generally have a backup tracer in order to break cycles, but if you're willing to live without that they usually don't need tracing to reclaim all memory). Many optimizations people commonly associate with tracing (e.g. cache friendliness due to compaction) are in practice only relevant for young generations most of the time; for older ones both tracing and reference counted implementations tend to benefit more from a really smart allocator with intelligent memory layout and partial reclamation.

I agree that optimized versions of refcounting are difficult and the fact that you still need a backup tracer for most code discourages people from using it, as well as that tracing doesn't negate a lot of systems optimizations. But a lot of the stuff you're saying is pretty misleading: the things that make tracing efficient can mostly be applied to make reference counting efficient without "turning it into tracing," with the cost that optimized tracing and optimized reference counting both need much more invasive knowledge about the user program (and correspondingly restrict the users) compared to the less optimized versions.


>> - no sharing across threads, otherwise locking is required for refcount updates >That is not true. Most atomic refcount implementations are lockfree (you do need synchronization though). This optimization has nothing to do with tracing. Given that you also need synchronization for tracing garbage collection (including frequent stop the world pauses in most cases, albeit brief ones), I don't think this is even really an advantage of tracing at all.

Not sure I agree with that. Copying references between local variables and reads/writes from heap all require expensive atomic operations for RC when they can't be optimized away. That's a major performance problem for languages like Swift. I do not say that one is better than the other, but this is exactly where tracing GC's shine compared to RC.

In the case of ZGC these doesn't require atomic operations, you need a read barrier for reading references from the heap though. But do not conflate tracing GC's read & write barriers with atomic barriers.


ZGC is interesting because it has a read barrier instead of a write barrier, yeah. But that's usually a tradeoff you make to reduce pause times, not improve throughput (IIRC it usually reduces throughput and fixing the barrier often requires an atomic write anyway, right?). For deferred / coalesced update RC the atomic overhead is amortized using (essentially) a write barrier (it only triggers at most once per object between reference counting collections, and does a similar amount of work to a traditional write barrier), and loads don't immediately require incrementing a reference count, so you end up in pretty much the same contention ballpark as a typical generational collector.

Again, the optimizations I'm describing are mostly distinct from turning RC into tracing, just applying the same sorts of optimizations we expect from production garbage collectors. The only exception is probably how in RCImmix, heavily referenced objects (4 bits are enough to precisely track something like 99.8% of objects, so "heavily referenced" refers to the other 0.2%) have their reference counts frozen so they don't pay anything until the backup trace starts. But it seems like most of the win from freezing reference counts comes from using fewer bits for the count, not avoiding the updates per se.


With refcount you need to sync every time you add/remove a reference to an object. For many (most?) workloads that's going to be quite substantially more frequent than tracing collections. Whether one or the other wins out probably depends a lot on your use case.

More generally, I think the parent comment was not aiming to say that reference counting is never appropriate, but that blindly going with reference counting because it's now "ew, GC" is misguided.


> With refcount you need to sync every time you add/remove a reference to an object. For many (most?) workloads that's going to be quite substantially more frequent than tracing collections. Whether one or the other wins out probably depends a lot on your use case.

That's not actually true of deferred reference counting with update coalescing. Instead, it defers the refcount increments and decrements to periodic intervals, much like optimized tracing implementations defer tracing to periodic intervals. This avoids reference count storms on the same objects and (especially for new objects) means much less work overall.


How many languages use this approach though? This probably also removes one major advantage of RC'ed systems though: deterministic destruction (e.g. close resource as soon as last reference disappears). And suddenly you need some kind of runtime too.


> How many languages use this approach though?

As I noted, there's at least one high performance GC implementation for Java that uses it, with effectively identical performance to a state of the art tracing GC on all benchmarks considered--it's not a toy idea. I don't think the fact that most production languages don't use it means very much, given how much more effort has been spent tuning tracing compared to reference counting.

> This probably also removes one major advantage of RC'ed systems though: deterministic destruction (e.g. close resource as soon as last reference disappears). And suddenly you need some kind of runtime too.

Completely deterministic destruction in the sense that you describe is pretty much incompatible with achieving the highest collection throughput. In particular it means you always have to trace dead objects independently (and, if all cores are busy, block someone from doing useful work while you deallocate), so most young object optimizations are a nonstarter. If you can stack allocate (or equivalent, e.g. push and pop from a vector with a trivial destructor) almost all your young objects, great; otherwise, for my money, keeping latency-sensitive things in well-scoped regions is far more deterministic and effective than either RC or tracing. The number of non-memory resources in most programs is usually rather low; you can always use traditional reference counting (or linear types or whatever) just for those.

> And suddenly you need some kind of runtime too.

Yes, which I mentioned a number of times. However, I don't think it's unreasonable to afford reference counting the same conveniences that tracing has when you want to talk about performance; otherwise you're not really comparing reference counting to tracing, but naive reference counting to highly optimized tracing. The fact that tracing essentially requires a runtime is certainly a point against it in some contexts (though conservative GCs like Boehm perform surprisingly well, they are usually no match for anything with precise liveness information), but I think people are a bit too conditioned to believe that things like stack maps have to be used for tracing--especially when features like exceptions and destructors that people consider acceptable in languages with small runtimes require similar metadata.


> The most optimized versions of refcounting I'm aware of get their wins from things like precise knowledge about live references and update coalescing-

I know Python, objective-c, and swift all use refcounting, but just for my own reading, what language has the most optimized ref counting right now?


The most optimized version I'm aware of is the RCImmix collector for Java: http://users.cecs.anu.edu.au/~steveb/downloads/pdf/rcix-oops.... It achieved performance parity with the regular Immix collector, which was the fastest (in the sense of throughput) available for the Jikes research JVM at the time. A backup collector is required for two reasons: cycle collection, and resetting stuck counts (they gain significant performance wins from keeping only a few bits for the actual refcount, for a variety of reasons: usually only a small percentage of objects need more, and they tend to be the most heavily accessed objects, so they save a lot of time by avoiding updates to them and also gain from being able to reuse objects' existing metadata fields rather than use extra space for the refcount). It's worth noting that most recent work on tracing GCs has been on reducing pause times, not improving throughput, so it wouldn't shock me if Immix is still the state of the art there.

There may have been further developments in improving RC throughput since 2013, but I'm not familiar with them; outside of Apple, there's very little modern research into optimizing reference counting. I know Swift does many cool optimizations of its own, but I'm not sure how restricted they are by having to remain compatible with Objective C.


This is a straw man.

Yes, of course, given the same code and allocation pattern, there are many cases where tracing GC will give you higher throughput than reference counting, particularly once you add in cycle detection. But in GC'd languages and non-GC'd languages you write code with completely different allocation patterns. In non-GC'd environments you only need to refcount the small set of allocations that are actually shared. GC'd languages usually have semantics that require tracing for every allocation.


Plenty of GC languages have value types and support for manual memory management.

Mesa/Cedar, Active Oberon, Modula-3, D, System C#, Go, C#


It's worth noting that Go's value types are not actually memory safe, however, since it allows nontrivial concurrent mutations to shared multiword objects (as described in https://blog.stalkr.net/2015/04/golang-data-races-to-break-m...). Unless that got fixed recently somehow? I believe in most of the other languages you mention, value types are restricted in various ways to avoid that problem (for instance, mutations are only allowed to value types on the stack that don't escape, or multiword value types can only be updated in ways that preserve data structure validity even if updates "tear", or multiword value types are immutable).


With the exception of Go and C#, there are no limitation on value types.

All those languages are GC enabled systems programming languages.

Regarding Go the memory safety in multithreaded code is orthogonal to having support for value types or not.


It's not orthogonal at all. Nontrivial mutations to multi-word sized values are inherently racy operations that can lead to UB if you don't have restrictions on how value types can be used. D isn't memory safe; Active Oberon employs mandatory object locking on writes (which is a rather heavy runtime solution); Cedar values are either immutable, or always valid with respect to word sized mutations (as far as I can tell, unions, for example, can only exist behind a pointer indirection). Modula 3 objects are not, as far as I can tell, value types (i.e. they must be heap-allocated and mutation creates a new record), and those are the only types it has where a "tearing" mutation could invalidate the runtime representation.

So, I disagree: all of the languages you mentioned either are memory unsafe when you use multiword value types concurrently, have one of the restrictions I mentioned, or (in the case of Active Oberon) conservatively lock on all writes. These value types are not "unrestricted" compared to primitives, all of which can be mutated in-place without allocating or locking, or to heap objects, which can be updated atomically even for interesting values at the cost of an allocation. Moreover, the issue is intimately tied to GC, or more accurately to the programmer not having to explicitly keep track of whether objects are uniquely owned or not (which is generally the function of a GC). As a result, if garbage collected objects can hold references to nontrivial value types, there is almost no way to avoid this problem.


I don't get why you are jumping into memory safety in multi-threaded environments.

We are discussing GC enabled systems programming languages with support for unsafe operations, not Rust's idea of thread safety.

D is memory safe in the context of @safe code, of course @system code is unsafe.

Active Oberon and Modula-3 also have record types, strings and arrays that can be stack allocated, declared on the global data memory segment or just manually allocated in unsafe packages.

In both cases I can also declare untraced pointers for memory blocks in unsafe packages, that the GC will gladly ignore.


Lack of thread safety leads to undefined behavior when you have nontrivial mutations to value types with tearing. That has nothing to do with some abstract notion of thread safety, nor does it have anything to do with Rust. It's a very straightforward consequence of the failure of (for example) things like bounds checking, or interpreting an integer as a pointer.

Record types with fields that don't depend on one another, strings and (fixed-size) arrays only support "uninteresting" mutations that remain valid even in the presence of atomic tearing. They are restricted compared to the general value types that you can have in a language like C++ (for example, sum types, unions, "fat objects" like Go's interfaces, or vectors with dynamic lengths), because for the latter having a write tear can lead to undefined behavior (for instance, overwriting a value vector can lead to the pointed-to vector part temporarily having a different length than the length field would indicate, which can easily lead to a buffer overflow).

The memory safe fragment of D has similar restrictions on its value types to the above. For example, its value arrays must have a size known at compile time.

Of course you can support such features in the unsafe fragment of a language. However, this is a pretty significant deterrent to actually using the feature and idiomatic code will avoid it most of the time.


If the manually allocated memory may contain references to managed objects then it still needs to be traced by the gc.

That or you start pinning managed objects or pull everything related into manual world. Both kinda suck.


Manually allocated memory in some of those languages is only allowed in the context of unsafe packages.

In any case, the point is that you should only do that for the code paths that actually matter, after profiling the application and not everywhere.

Be productive, make use of the GC, naturally taking into the consideration the algorithms + data structures being used.

If that is still not enough to meet the application's SLAs, then manually allocation comes into play.


Sure, even Java lets you manually allocate memory via JNI, but negligible amounts of code written in those languages actually uses manual management. Reference counting is uncompetitive in those languages because idiomatic Java/C#/Go make lots of allocations and pass around lots of pointers with no lifetime semantics.


I can also code in C with malloc()/free() everywhere.

The features are available, it is up to the programmers to use them


But (a) that's not idiomatic (b) malloc/free aren't refcounted.


I don't see a straw man in the GP comment. He was clearly comparing tracing GC vs refcounting, while you're conflating memory management with value semantics. Specifically, you're assigning the benefits of value semantics to the absence of a GC and vice versa. While GC'd languages more frequently omit value semantics, they're orthogonal and there are languages (e.g., Go) that profit from both a GC and value semantics.


> GC'd languages and non-GC'd languages you write code with completely different allocation patterns.

[citation needed]


> - no sharing across threads, otherwise locking is required for refcount updates

If you are sharing across threads you will need a synchronization mechanism, whether you have reference counting or not.

So it's not exactly fair to assign this cost to reference counting.

General reference counting does make adding or removing a reference a read/write operation, though of course non-naive implementations don't do this at every turn. Not to mention tracing GC has a reference cost as well.


Tracing GCs don't need to synchronize at each access point.

One can optimize refcounting with deferred reference counting algorithms, but then that is already the skeleton implementation of a tracing GC.


> Then refcounting is only faster than GC in very constrained scenarios:

Throughput is not the only concern when designing a memory management strategy though.


> Many think that refcount isn't a GC algorithm and that it is faster than GC.

As usual, people are confused by improper use of terminology. It's like async in Python, three quarters of the complexity is in abuse of terminology.


From the hacker folklore (AI koans, available at http://catb.org/jargon/html/koans.html):

Moon instructs a student

One day a student came to Moon and said: “I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons.”

Moon patiently told the student the following story:

“One day a student came to Moon and said: ‘I understand how to make a better garbage collector...


In addition to what others have said, the choice of memory management schemes has big knock-on effects. It influences how much runtime metadata you need to retain (e.g. stack maps), it has lots of implications for the FFI, and may constrain various tricks like unions, tagged pointers, NaN boxing, etc. So narrowly comparing schemes in terms of just allocation throughput won't paint the whole picture.


Load-increment-store is slow. Load-decrement-branch-if-zero is slow. And you still need a tracing collector for cycles. So it's slow and still nondeterministic.


Okay, what's better, from a generalist standpoint? I have a decent but very high-level sketched-out understanding of refcounting, but I know nothing else about what's out there, and would be curious to find out.

Particularly what would work well on x86 and ARM.

An odd and probably completely wrongheaded idea just came to mind: add collectible objects to a lock-free queue in one thread, collect them in another. I wonder if you could do the mark-and-sweep stage across threads too...?


Better, depends on tradeoff's. RefCounting misses a key feature that ZGC, C4, Shenandoah and other Java GCs do (but not e.g. GO) and that is compacting. Secondly when accessing ref counted objects concurrently ref counting needs to be thread safe. i.e. using atomics, this does have significant performance overheads.

Pragmatically in high-throughput situations modern GC's perform better than simple GC. The new things in Java land is that the newest GCs have vastly reduced stop the world times (trending towards OS jitter times), while preserving compaction and without to bad throughput hits. (Even with the misfeatures of finalizers)

My personal experience with Shenandoah shows that this changes the way we think about heap settings for java. i.e. just set -Xmx to 60% of machine ram and let idle collections make sure we never use more than 4% in normal days.


For me the biggest letdown of RC is dirtying an otherwise constant pristine page of memory with unneeded updates. Every little usage of that variable changes the variable because it has to update the RC. You are also forced to use two-word values instead of just one (many having even more), with cache trashing and complicated atomics. That's why usually a GC esp. a compacting one is so much faster than refcounting.


Start by reaching out to "The Garbage Collection Handbook" that I mention in another thread.


EasyMile | Elixir/Erlang Engineer | Toulouse, France | https://easymile.com

EasyMile designs and integrates software solutions for autonomous vehicles. We have built our own vehicle: the EZ10 shuttle. To develop our fleet management system in Elixir, we are looking for developers with a very good experience of OTP (Elixir or Erlang). Developers coming from functional languages such as Scala with Akka are also welcome to apply!

If interested, feel free to send me an email with your resume at alexandre.hamez at easymile.com.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: