Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is this comment being downvoted?

OpenAI can internally keep a "hash" or a "signature" of every output it ever generated.

Given a piece of text, they should then be able to trace back to either a specific session (or a set of sessions) through which this text was generated in.

Depending on the hit rate and the hashing methods used, they may be able to indicate the likelihood of a piece of text being generated by AI.



Why would they want to is my question. A single character change would break it.

Then you have database costs of storing all that data forever.

Moreso, it's only for openAI, I don't think it will be too long before other gpt4 level models are around and won't give two shits about catering to the AI identification police.


> A single character change would break it.

That depends on how they hash the data, right? They can use various types of Perceptual Hashing [1] techniques which wouldn't be susceptible to a single-character change.

[1] https://en.wikipedia.org/wiki/Perceptual_hashing

> Then you have database costs of storing all that data forever.

A database of all textual content generated by people? That sounds like a gold mine, not a liability. But as I've mentioned earlier, they don't need to keep the raw data (a perceptual hash is enough).

> won't give two shits about catering to the AI identification police

I'm sure there will be customers willing to pay for access to these checks, even if they're only limited to OpenAI's product (universities and schools - for plagiarism detection, government agencies, intelligence agencies, police, etc).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: