Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This way you would trade in a null-byte-terminated variable length string for essentially a null-bit-terminated variable length number (plus the remaining string). I am not convinced that this actually would be much safer.


Unicode does variable length bit strings too, so I'm not a visionary or anything. It would be safer for no other reason than that such a pattern could only occur at the start of the string, with zero special handling, while a null could occur anywhere in a zero-terminated string.


This is just the LEB128 format, which is used commonly used and I don't think there's any serious problems with it.


Interesting. Thank you for sharing this!


At least you don't have (obvious) performance problems with it, because you will effectively never need more than 9 (usually 2 or 3) of these bytes.

But sure on modern 64 bit systems just using a 64 bit integer makes much more sense. On a small embedded 8 bit oder 16 bit microcontroller it might make sense.


You are correct, I was trying to show that such a scheme was practical even in the early 1980s when zero-termination was beginning to dominate. This could well be used on 64-bit systems (just with a larger word size than a byte), though the utility of such a thing is questionable.


In a toy language I once wrote I got around that by encoding binary values as quaternary values and using a ternary system on top of that with a termination character: 11 = 1; 01 = 0; 00 = end; 10 was unused.

Having truly unbounded integers was rather fun. Of course performance was abysmal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: