I agree it isn't the main property we care about, we care about reliability. But...

Yoric · 2025-12-12T09:26:34 1765531594

Ah, I realize that I had misunderstood your earlier comment, my apologies and thanks for clarifying!

We're leaving my area of confidence, so take everything I write with a pinch of salt.

As far as I understand, indeed, each layer transforms a set of inputs into a probability distribution. However, if you wanted to compute entirely with probability distributions, you'd need the ability to compose these distributions across layers. Mathematically, it doesn't feel particularly complicated, but computationally, it feels like this adds several orders of magnitude of both space and time.