More

thomasmg · 2026-02-19T19:24:24 1771529064

Yes. The problem is that most memory errors (out of bounds + use after free etc.) result in a vulnerability. Only a minority of the logic errors do.

For operating systems kernels, browsers etc, vulnerabilities have a much, much bigger impact than logic errors: vulnerabilities need to be fixed immediately, and released immediately. Most logic errors don't need to be fixed immediately (sure, it depends on the issue, and on the type of software.)

I would probably say "for memory unsafe languages, 80% of the _impact_ is due to memory vulnerabilities"

thomasmg · 2026-02-16T12:14:14 1771244054

Yes. His work is really inspiring, eg. the text editor less than 1000 lines, tiny language in less than 1000 lines etc. When I read this on HN, I converted his text editor to my hobby programming language [1] (330 lines), and then wrote a chess engine with terminal UI [2] (400 lines), Tetris clone [3] in 140 lines. I also have a QR code generator and parser [4] (700 lines, still in Java only), a PDF generator (200 lines), and now also a tiny programming language [5].

[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test... [2] https://github.com/thomasmueller/bau-lang/blob/main/src/test... [3] https://github.com/thomasmueller/bau-lang/blob/main/src/test... [4] https://github.com/thomasmueller/bau-lang/blob/main/src/test... [5] https://thomasmueller.github.io/bau-lang/at.html

thomasmg · 2026-02-14T20:47:09 1771102029

Cool! I just recently implemented a chess engine in ~400 (readable) lines, with all rules, first in Java and then ported to my own programming language "Bau" [1]. This is including a terminal UI. I'll measure the ELO, but I was never able to beat it :-) The castling moves are specially tricky to implement I think. I enjoyed the challenge as well.

[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...

ycombinatrix · 2026-02-15T01:16:55 1771118215

How come there's no unsigned numeric types in Bau?

thomasmg · 2026-02-15T07:50:16 1771141816

I tried to describe this in [1]: "Unsigned integer are intentionally not supported to simplify learning and using the language, to avoid surprising behavior and edge cases, and to reduce security issues and error-handling pitfalls. When needed, unsigned behavior is available through explicit operations. This design does not affect performance or memory usage."

I understand this may not sound very convincing yet... it is hard to describe... basically, unsigned types sound simple, but they are not.

[1] https://thomasmueller.github.io/bau-lang/features.html

ycombinatrix · 2026-02-17T09:07:39 1771319259

Hmm, I don't quite follow. For usages like counters or version numbers, it seems like allowing negatives makes things more complicated rather than less.

Like, what if you're getting a u32 over the wire? I feel like using an i32 to represent that data makes it more error-prone. Numbers will unexpectedly appear to be negative?

thomasmg · 2026-02-11T23:44:51 1770853491

LLMs are good at predicting the next token. Basically you use them to predict what are the probabilities of the next tokens to be a, b, or c. And then use arithmetic coding to store which one matched. So the LLM is used during compression and decompression.

thomasmg · 2026-01-27T18:32:21 1769538741

China: for Taiwan, they are in the planning phase. (Vietnam, Hong Kong, Tibet, Aksai Chin, Korea, Scarborough Shoal do not count in your view of course). Not saying they are worse than the US.

rluna828 · 2026-01-27T19:40:51 1769542851

What China did to the Han Chinese makes them worse than ANY other modern country. The great leap forward and the cultural revolution have not comparison. Add in the chinese invasion of Tibet in 1959 and 1979 invasion of Vietnam and they are butchers and imperialists.

kdfjgbdfkjgb · 2026-01-27T20:43:24 1769546604

the Han? are you sure you didn't mean a different group?

tw1984 · 2026-01-28T05:45:40 1769579140

> The great leap forward

You need far better propaganda materials for your "great leap forward" blames in 2026. There were bad policies, but the intention good, it was all about moving the country forward. It failed horribly with huge consequences, that is just the reminder that a full scale industrialisation for over 1 billion people is not something that can be earned easily.

Like it or not, the "Exceeding the UK, catching the USA" (超英赶美) goal of the great leap forward has been overfulfilled under the leadership of the CCP with the help of brutal state capitalism. Everything else is just cheap talk.

Having a full scale industrialization larger than the G7 combined is not something handed to China on a silver platter - those very sad deaths caused by the failed attempts during the great leap forward was a part of the costs.

> and the cultural revolution have not comparison.

The cultural revolution is brutal, nothing should be used to defend it. It is just so wrong. That being said, the west is going through the exact same cultural revolution -

* extremely polarised society with everything is politicalised * populism taking control * suicidal policies destroying the civilizational foundations

the difference is 99% Han Chinese consider the cultural revolution as extremely bad, while the west is enjoying having its own ongoing cultural revolution.

if you add the recent woke cancer, the western version of the ongoing cultural revolution is far more brutal.

direwolf20 · 2026-01-28T11:56:47 1769601407

Intention doesn't matter

rluna828 · 2026-01-29T21:24:02 1769721842

So you agree that 50mm Han Chinese dead makes the CCP brutal, correct? Every brutal regime thinks they are justified. Ask the US if they think they are justified. Woke is dead. China is much smaller than the west. The G7 is maybe 1/4 of the west. The west includes EU, US, first island chain (japan, s. korea, Philippines, etc), all of the western hemisphere.) China has no allies, it probably doesn't need them. However, China doesn't control its vital sea-lanes. It has less water per person than Saudi Arabia. China has more old people than the west and Confucianism prohibit to "great leap forward" them. China is not escaping the middle income trap.

tw1984 · 2026-01-31T06:12:34 1769839954

I don't know what you are smoking but that thing must be strong. Have fun.

thomasmg · 2026-01-25T17:08:22 1769360902

Schubfach, Ryū, Dragonbox etc support round-tripping and shortest-width, which (it sounds like) is not important for you. The idea of round-tripping is that if you convert a double to a string and then parse that, you get the exact same value. Shortest-width is to correctly round and generate the shortest possible text. I tried to implement a version that does _just_ round-tripping but is not shortest-width, it is around 290 lines for both parsing and toString [1]

[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...

thomasmg · 2026-01-22T19:28:52 1769110132

There are many variants. It really depends on what features you need. Cuckoo filters were mentioned. If you want to add and remove entries and the regular counting Bloom filter are not good enough, I would look at the "succinct counting blocked Bloom filter" [1]: they only need about twice the space than regular Bloom filters. Sure, cuckoo filters need less memory, but they can fail basically any time, while these can not.

Tiny filters: Some time ago I worked on tiny statistics [2]. This includes some 64-bit HyperLogLog implementations; some use linear counting, which is basically a 64-bit Bloom filter, until some limit, and only then switch to HyperLogLog. This is great for distinct counts of columns in databases (cardinality estimation). This project also includes 64-bit approximate counts and histograms.

[1] https://github.com/FastFilter/fastfilter_java/blob/master/fa... [2] https://github.com/thomasmueller/tinyStats

Sesse__ · 2026-01-23T16:02:17 1769184137

FWIW, I found https://github.com/FastFilter/fastfilter_java/issues/28 a pretty good explanation of what's going on in the succinct counting blocked Bloom filters. (I'm not sure if the blocking is needed for the normal Bloom filter part, though, i.e., all bits don't necessarily need to fall within the same block, no? But the counts are stored separately for each block.)

thomasmg · 2026-01-23T16:12:27 1769184747

Yes, non-blocked is also possible. This would need a bit less space, and would be a bit slower. The counts > 1 (per bit that is set) are stored spearately, yes.

Sesse__ · 2026-01-23T18:29:37 1769192977

> This would need a bit less space, and would be a bit slower.

I guess that is because the count storage update is really slow, right, so it's better to have one than two (or whatever number of set bits) operations? At least the linked code seems to process it one by one bit when updating, and without some sort of “rank of n-th set bit” operation (which would accelerate the “select” version fairly well), I'm not sure it could be made much faster than that either.

Edit: I see https://github.com/FastFilter/fastfilter_java/blob/master/fa... tries to make that operation in O(1), but to be honest, with this many operations, the cure almost looks worse than the disease. :-) Haven't benchmarked, though.

thomasmg · 2026-01-25T06:32:16 1769322736

Yes the count storage update is a bit slow. It could be implemented as a background thread, so that it is not blocking. It depends on the use case wheter this is a problem. The 'blocked' variant might be faster to optimize I assume, if this is needed.

Sesse__ · 2026-01-25T08:56:42 1769331402

> It could be implemented as a background thread, so that it is not blocking.

How would you realistically do this without creating more overhead in the thread communication than what you're saving? I've never heard of offloading a 30–40 cycle operation to a thread before. Typically sending an atomic across CPUs is what, 200 cycles? (That's assuming you have the thread just sitting there in some kind of pool; firing up a new one is much, much more expensive. It also assumes you never need to hear back from it, so wouldn't work well for a decrease where you need to know the count.)

thomasmg · 2026-01-25T18:30:48 1769365848

Right, it would be wasteful if it's done for each entry separately. But that is not needed. (I should probably write an article about this and a better implementation).

The "succinct counting (blocked) Bloom filters" have two components: the regular Bloom filter for querying, and the count storage. The count storage is not "strictly" needed for querying: you will never get a false _negative_ is increments / decrements in the count storage are deferred. So updating the count storage can be done asynchronously. Both increments and decrements, if you want. So these can be buffered, eg 100 increments / decrements at a time. Sure, the false-positive rate is slightly higher than needed if the decrements are deferred, but with a large filter this effect is small.

So that means, you can buffer increments / decrements in the count storage. You still want to do the "or" operations in the the Bloom filter part synchronously, so that you don't get false negatives. And then, it is no longer just 30-40 cycles, but 3000-4000 cycles, or 30'000-40'000 cycles, or so. I understand this would not be trivial to implement, but also not very complex. I never really had a real-world use case so far, so I didn't work on a full implementation yet.

Sesse__ · 2026-01-25T20:54:41 1769374481

Sure, you can batch. But that means your decrements must be _entirely_ buffered, right? Because you cannot remove anything from the main filter until you know its count is zero, and you don't know its count until you've processed all increments. And you need to do that under a lock, or you'll have a race where you could have a false negative in the main filter for a short while.

thomasmg · 2026-01-25T23:55:09 1769385309

Yes exactly!

thomasmg · 2026-01-22T15:04:59 1769094299

I'm one of the authors. Feel free to ask anything.

kuhdungbingo · 2026-01-22T15:49:42 1769096982

Now would this be useful in high scaling games, specifically: determining if you might be a winner of game with a grid of 10^nx10^n (where n>5). A very large "cow bingo game", where the insertions are made randomly spawned on a grid? Seems one of the other filters would be more appropriate as they support dynamic insertion.

Still very neat, nice work!

thomasmg · 2026-01-23T05:56:21 1769147781

Yes, you would need a dynamic filter (eg. regular Bloom filter or cuckoo filter) for this, due to the insertions.

Static filters are good for eg. leaked password lists, LSM trees (LevelDB, RocksDB), and so on.

thomasmg · 2026-01-21T13:00:29 1769000429

The PDF standard uses base 85 encoding (Ascii 85).

thomasmg · 2026-01-12T16:05:18 1768233918

I wonder, how can a programming language have the productivity of a high-level language ("write like a high-level language"), if it has manual memory management? This just doesn't add up in my view.

I'm writing my own programming language that tries "Write like a high-level language, run like C.", but it does not have manual memory management. It has reference counting with lightweight borrowing for performance sensitive parts: https://github.com/thomasmueller/bau-lang

jhgb · 2026-01-13T03:21:26 1768274486

C is literally a high level language.

throwaway17_17 · 2026-01-13T05:40:11 1768282811

Seriously, in the discussion happening in this thread C is clearly not a high-level language in context.

I get your statement and even agree with it in certain contexts. But in a discussion where high-level languages are presumed (in context) to not have memory management, looping constructs are defined over a semantics inferred range of some given types, overloading of functions (maybe even operators), algebraic datatypes, and other functional language mixins: C most certainly IS NOT a high level language.

This is pedantic to the point of being derailing and in some ways seemed geared to end the discussion occurring by sticking a bar in the conversations spokes.

saidnooneever · 2026-01-13T11:16:34 1768302994

glad you bring up context in this note. i find C high level too but u are right, in a comparisson you can still say its really low level.

C was coined originally as high level because the alternatives were things like assembler. a term rooted in comparisson more than anything.

throwaway17_17 · 2026-01-13T14:06:52 1768313212

Thanks, my parent’s comment is almost a thought-terminating cliche in this kind of discussion. However, Chisnall’s now classic ‘C is not a low level language’ article is one of my favorite papers on language theory and potential hardware design. A discussion about the shortcomings of viewing C as a low level language can/could be profitable, deep, and interesting; but context is king.

afreire · 2026-01-13T09:30:53 1768296653

It has autofree and drop traits.