Why is this comment being downvoted? OpenAI can internally keep a "hash" or a "s...

pixl97 · on July 25, 2023

Why would they want to is my question. A single character change would break it.

Then you have database costs of storing all that data forever.

Moreso, it's only for openAI, I don't think it will be too long before other gpt4 level models are around and won't give two shits about catering to the AI identification police.

csmpltn · on July 25, 2023

> A single character change would break it.

That depends on how they hash the data, right? They can use various types of Perceptual Hashing [1] techniques which wouldn't be susceptible to a single-character change.

[1] https://en.wikipedia.org/wiki/Perceptual_hashing

> Then you have database costs of storing all that data forever.

A database of all textual content generated by people? That sounds like a gold mine, not a liability. But as I've mentioned earlier, they don't need to keep the raw data (a perceptual hash is enough).

> won't give two shits about catering to the AI identification police

I'm sure there will be customers willing to pay for access to these checks, even if they're only limited to OpenAI's product (universities and schools - for plagiarism detection, government agencies, intelligence agencies, police, etc).