Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that using this will likely violate strict aliasing due to using the void pointer returned as one type, freeing it, and then getting that same chunk again and using it as another type. You'll probably want to compile with -fno-strict-aliasing to be safe.

A good reference on strict aliasing: https://gist.github.com/shafik/848ae25ee209f698763cffee272a5...



There's no reason an allocator should ever trip over strict aliasing rules.

malloc() returns an address. The user code writes a value, and then reads a value of the same type. No problem.

Later, malloc() returns the same address. The user code writes a value of another type then reads a value of that same type. Again, no problem. There's no aliasing going on here.

The act of writing a value of a different type tells the compiler that the lifetime of the previous object has ended. There's no special magic required.

Strict aliasing is about situations where we write a value of one type, and then attempt to read a value of a different type from the same location. If you want to do that, then you have to be extremely cautious. But memory allocators don't do that, so it's not an issue.

(Obviously just talking about C here, or POD in C++ terms. If you are dealing with C++ objects with destructors, then you have an extra layer of complexity to deal with, but again that's nothing to do with aliasing.)


>The act of writing a value of a different type tells the compiler that the lifetime of the previous object has ended.

afaik only memcpy has that magic property, so I think parent is almost correct.

  void *p = malloc(n);
  *(int *)p = 42; // ok, *p is now an int.
  //*(float *)p = 3.14f; // I think this is not allowed, p points to an int object, regular stores do not change effective type
  float x = 3.14f;
  memcpy(p, &x, sizeof(float)); // but this is fine, *p now has effective type float
So in the new, pool_new:

  pool->chunk_arr[i].next = &pool->chunk_arr[i + 1];
This sets the effect type of the chunk block to 'Chunk'

Later in pool_alloc:

  Chunk* result    = pool->free_chunk;
  ...
  return result;
result has effective type 'Chunk'

In user code:

  int *x = pool_alloc();
  *x = 42; // aliasing violation, *x has effective type 'Chunk' but tried to access it as an int*
User code would need to look like this:

  int *x = pool_alloc();
  memcpy(x, &(int){0}, sizeof(int)); // establish new effective type as 'int'
  // now we can do
  *x = 42;**


And this is why type based alias analysis (TBAA) is insane and why projects like linux complies with fno-strict-aliasing.

C should issue a defect report and get rid of that nonsense from the standard.


C doesn't have "alias analysis" in the standard. It has an (informally specified) memory model which has "memory objects" which have a single type, which means treating them as a different type is undefined behavior.

This enables security analysis like valgrind/ASan and secure hardware like MTE/CHERI so it's very important and you can't get rid of it.

However, it's not possible to implement malloc() in C because malloc() is defined as returning new "memory objects" and there is no C operation which creates "memory objects" except malloc() itself. So it only works as long as you can't see into the implementation, or if the compiler gives you special forgiveness somehow.

C++ has such an operation called "placement new", so you want something like that.


You can definitely implement malloc in C. It does nothing special in its most basic form but cough up void pointers into its own arena.

It gets complicated when you have virtual memory and an OS involved but even then you can override the system malloc with a simple implementation that allocates from a large static array.


No, returning parts of an array does not implement malloc as described in the standard. That's not a new memory object, it's a part of an existing one.


The standard is written to accommodate obsolete tagged memory architectures that require special support. They aren't relevant today and data pointers are fungible regardless of where they originate.


> data pointers are fungible regardless of where they originate.

This was never true because of something called provenance: https://www.ralfj.de/blog/2020/12/14/provenance.html. Though it usually doesn't matter and I think it annoys anyone who finds out about it.

But in practice it's not always true on Apple A12 or later because they support PAC (so pointers of different type to the same address can be not equal bit-wise) and is even less true on very latest Android because it supports the really big gun MTE. And MTE is great; you don't want to miss out on it. No explainer here because there's no Wikipedia article for it(!).

Also becomes not true on any system if you use -fbounds-safety or some of the sanitizers.


Morello is.


There are other issues besides changing the memory type. For instance, C has those rules about out of bounds pointers being undefined, but you can't implement that - if you return part of the pool and someone calculates an out of bounds address they're getting a valid address to the rest of the pool. That's why you can't implement malloc() in C.

(The difference here is that system malloc() works with valgrind, -fbounds-safety, theoretical secure hardware with bounds checking etc., and this one doesn't.)


Undefined behavior is behavior you can't avoid implementing, because no matter what your compiler and runtime do, it complies with the spec. In particular getting valid addresses to other objects from out-of-bounds address arithmetic is not just conformant with the C standard but by far the most common conforming behavior.


Meant to say you can't implement it as an invalid/trap state. This is possible in some implementations but they have to cooperate with you to do it.

> In particular getting valid addresses to other objects from out-of-bounds address arithmetic is not just conformant with the C standard but by far the most common conforming behavior.

One reason calculating out of bounds addresses might not work out is the calculation might cause the pointer to overflow, and then surprising things might happen like comparisons failing or tag bits in the high bytes getting corrupted.


Oh, then I agree. My apologies for interpreting you as saying something so obviously incorrect. Yes, in particular CHERI has a mechanism to shrink the bounds of a pointer, but just returning a pointer into an array won't do it.


> aliasing violation, *x has effective type 'Chunk'

This doesn't make any sense. How do you know its effective type if you don't have access to the definition of `pool_alloc()'?


If you can guarantee its always compiled in a separate TU and never inlined, sure, might be a practical way so 'solve' this issue, but if you then do some LTO (or do a unity build or something) the compiler might suddenly break your code. Another way is to add an inline asm block with "memory" clobber to escape the pointer so the optimizer can't destroy your code.

It's really quite ridiculous that compiler implementers have managed to overzealously nitpick the C standard so that you can't implement a memory allocator in C.


> It's really quite ridiculous that compiler implementers have managed to overzealously nitpick the C standard so that you can't implement a memory allocator in C.

This is good because it's also what gives you valgrind and CHERI. Take one away and you can't have the other.

(Because if you define all undefined behavior, then programs will rely on it and you can't assume it's an error anymore.)


Any type-changing store to allocated storage has this property in C.


> The act of writing a value of a different type tells the compiler that the lifetime of the previous object has ended. There's no special magic required.

I think strictly speaking in C this is only true for anonymous memory, i.e. memory you got from some allocator-like function. So if your custom allocator gets its memory from malloc (or sbrk, or mmap), everything is fine, but, for example, you are allocating from a static char array, it is formally UB.

In C++ you can use placement new to change the type of anything.


Yes, type-changing stores are guaranteed by the C standard to work only for allocated storage. In practice no compiler makes a difference between declared and allocated storage. A bigger problem is that Clang does not implement the C rules as specified..


That's what allocators do. If C's object model doesn't allow users of the language to write their own allocators then that object model is broken.

C++ relatively has fixes to allow allocators to work, it requires calls to std::launder.


I understand the C standard hides such actions behind a wall of "this is the allocator", and expected the compiler authors to also be the allocator authors, allowing them to know when/how they can break such rules (in the context of their own compiler)


No, allocators are not magic (in that regard) in C. There is nothing out of the ordinary going on with an allocator, the parent comment is simply mistaken (as pointed out by another answer).


Ah, I see that it's because the char type never violates Strict Aliasing. I was wondering how you could define a type such as Chunk, yet hand out a pointer to it to a user who will cast it to some other type.


Well the pool code fails to use char* type to write inside block of memory.

If you look at 'pool_free' code you can see that it receives used block from user as 'ptr' then casts it to 'Chunk pointer' and writes to into Chunk member value of type 'Chunk pointer'. I had to change that line to memcpy when I did it last time around 15 years ago in C++ with GCC 4.something. In short if you are writing your own allocators you need either to disable strict aliasing like Linux does or do the dance of 'memcpy' when you access memory as 'internal allocator structures' right after it was used as user type. When it happened to me write was reordered and I observed code that writes into Chunk* next executed first and write of zeros into user type second.


Note that C++ has different rules than C. C has type changing stores that do not require memcpy for allocated storage (i.e. no declared type). Older compilers have plenty of bugs related to TBAA though. Newer versions of GCC should be ok.


Which is why pointer provenance is an issue as well.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: