This way you would trade in a null-byte-terminated variable length string for es...

selfhoster11 · on March 5, 2021

Unicode does variable length bit strings too, so I'm not a visionary or anything. It would be safer for no other reason than that such a pattern could only occur at the start of the string, with zero special handling, while a null could occur anywhere in a zero-terminated string.

pertymcpert · on March 4, 2021

This is just the LEB128 format, which is used commonly used and I don't think there's any serious problems with it.

selfhoster11 · on March 5, 2021

Interesting. Thank you for sharing this!

ascar · on March 4, 2021

At least you don't have (obvious) performance problems with it, because you will effectively never need more than 9 (usually 2 or 3) of these bytes.

But sure on modern 64 bit systems just using a 64 bit integer makes much more sense. On a small embedded 8 bit oder 16 bit microcontroller it might make sense.

selfhoster11 · on March 5, 2021

You are correct, I was trying to show that such a scheme was practical even in the early 1980s when zero-termination was beginning to dominate. This could well be used on 64-bit systems (just with a larger word size than a byte), though the utility of such a thing is questionable.

konjin · on March 5, 2021

In a toy language I once wrote I got around that by encoding binary values as quaternary values and using a ternary system on top of that with a termination character: 11 = 1; 01 = 0; 00 = end; 10 was unused.

Having truly unbounded integers was rather fun. Of course performance was abysmal.