what if you sha it?

nine_k · on Nov 22, 2022

Adding a crypto hash allows to check that the hashed value was not changed, because finding another value with the same hash is hard, by definition of a crypto hash.

But here the problem is not forging an ID, it's guessing an ID, and hashing does not widen the search space, does not increase randomness.

dspillett · on Nov 22, 2022

> Adding a crypto hash

I think the poster you replied to was meaning using the hash output as the token, not that you would maintain the original token and a salted hash for verification.

If they are thinking SHA(GenerateUUID()) would have better entropy then they are incorrect even though all SHA variants output more than the 128-bits in the source UUID. I assume such misunderstanding comes from the fact that some PRNGs are based upon repeated application of cryptographically assured hash functions against the seed data.

Using some unreversible transform would solve the issue of potentially leaking information in the UUIDs, but if that is an issue then instead use a UUID variant based on purely random data (v4?) as that would be more efficient and not result in value that is longer but contains no extra entropy.

WirelessGigabit · on Nov 22, 2022

That actually reduces the usefulness as you're hashing the data into a smaller length.

unlikelymordant · on Nov 22, 2022

It seems uuids are 128 bit, while sha is 160 bit. There is also sha256 and sha512 for longer hashed. So there shouldnt be any worries about the hash being shorter.

jchw · on Nov 22, 2022

Rereading I am guessing you're merely pointing out that the comment regarding shortening the length is untrue. If you already understand the entropy issue here, please treat my "you"s as royal you's.

You have a 128 bit value. That's 128 binary digits. Each digit can be zero or one. That means you have 2^128 possible distinct values. (Ignoring the fixed bits in UUIDs since it's not important for sake of this argument.)

Now you use a one-way cryptographic hash on top, like sha256. This will return a specific hash for any given input. It is always the same for a specific given input, and it is nearly always distinct. The output that a hash has may have more bits, but the number of distinct values can't increase; it can only ever decrease. That's because you could only ever give it 2^128 different values. How could it ever return more outputs if each input corresponds to one output?

To make it more clear, let's say you have a database where you want to store a customer's zip code so you can use it as some kind of validation later on to ensure it matches, but you don't want to store it in plaintext, so you hash it. The hash is 160 bits. Secure, right? Wrong. There are less than 50,000 zip codes. It would be trivial to calculate the hash of every single one and use it as a simple hashmaps from hashed value to plaintext.

You may be thinking this is impractical for an input domain as large as 2^128, but realistically it only adds a slight roadblock. Knowing the only valid values will be hashed UUIDs, instead of picking 160 random bits, you'd be much better off picking a random UUID, hashing it, and trying that for each attempt.

markatto · on Nov 22, 2022

Yes, some hashes might not meaningfully hurt it, but they won’t add any entropy, which is the real problem.