Copying a comment I posted a while ago: I listened to a podcast with Scott Aaron...

constantcrying · on July 25, 2023

I think the chance of this working reliably is precisely zero. There are multiple trivial attacks against this and it can not work if the user has any kind of access to token level data (where he could trivially write his own truly random choice). And if there is a non-water marking neural network with enough capacity to do simple rewriting you can easily remove any watermark or the user does the minor rewrite himself.

teaearlgraycold · on July 26, 2023

It’ll be the equivalent to a shutterstock watermark.

air7 · on July 25, 2023

I heard of this (very neat) idea and gave it some thought. I think it can work very well in the short term. Perhaps OpenAI has already implemented this and can secretly detect long enough text created by GPT with high levels of accuracy.

However, as soon a detection tool becomes publicly available (or even just the knowledge that watermarking has been implemented internally), a simple enough garbling LLM would pop up that would only need to be smart enough to change words and phrasing here and there.

Of course these garbling LLMs could have a watermark of their own... So it might turn out to be a kind of cat-and-mouse game but with strong bias towards the mouse, as FOSS versions of garblers would be created or people would actually do some work manually, and make the changes by hand.

constantcrying · on July 25, 2023

There are already quite complex language models which can run on a CPU. Outside of the government banning personal LLMs, the chance of there not existing a working fully FOSS and open data rewrite model, if it becomes known that ChatGPT output is marked, seems very low.

The water marking techniques also can not work after some level of sophisticated rewriting. There simply will be no data encoded in the probabilities of the words.

motti · on July 25, 2023

If it's sophisticatedly rewritten then it's no longer AI generated

flangola7 · on July 26, 2023

That is not a reliable indicator even today. GPT-4 (not the ChatGPT RLHF one) is not distinguishable from human writing. You could ask it about modern events, but that's not a long term plan, and it could just make the excuse they don't follow the news.

concurrentsquar · on July 25, 2023

This, or cryptographic signing (like what the C2PA suggests) of all real digital media on the Earth are the only ways to maintain consensus reality (https://en.wikipedia.org/wiki/Consensus_reality) in a post-AI world.

I personally would want to live in Aaronson's world, and not the world where a centralized authority controls the definition of reality.

greiskul · on July 25, 2023

How can we maintain consensus reality, when it has never existed? There are a couple of bubbles of humanity where honesty and skepticism and valued. Everywhere else, at all moments of history, truth has been manipulated to subjugate people. Be it newspaper owned by polical families, priests, etc.

ummonk · on July 25, 2023

This would be trivially broken once sufficiently good open source pretrained LLMs become available, as bad actors would simply use unwatermarked models.

vunderba · on July 25, 2023

Even if you could force the bad actors to use this watermarked large language model, there's no guarantee that they couldn't immediately feed that through Langchain into a different large language model that would render all the original watermarks useless.