Even the idea of it is bad, ChatGPT is *supposed to* write indistinguishably fro...

bko · on July 25, 2023

Copying a comment I posted a while ago:

I listened to a podcast with Scott Aaronson that I'd highly recommend [0]. He's a theoretical computer scientist but he was recruited by OpenAI to work on AI safety. He has a very practical view on the matter and is focusing his efforts on leveraging the probabilistic nature of LLMs to provide a digital undetectable watermark. So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM. It's really clever and apparently he has a working prototype in development.

Some work arounds he hasn't figured out yet is asking for an output in language X and then translating it into language Y. But those may still be eventually figured out.

I think watermarking would be a big step forward to practical AI safety and ideally this method would be adopted by all major LLMs.

That part starts around 1 hour 25 min in.

> Scott Aaronson: Exactly. In fact, we have a pseudorandom function that maps the N-gram to, let’s say, a real number from zero to one. Let’s say we call that real number ri for each possible choice i of the next token. And then let’s say that GPT has told us that the ith token should be chosen with probability pi.

https://axrp.net/episode/2023/04/11/episode-20-reform-ai-ali...

constantcrying · on July 25, 2023

I think the chance of this working reliably is precisely zero. There are multiple trivial attacks against this and it can not work if the user has any kind of access to token level data (where he could trivially write his own truly random choice). And if there is a non-water marking neural network with enough capacity to do simple rewriting you can easily remove any watermark or the user does the minor rewrite himself.

teaearlgraycold · on July 26, 2023

It’ll be the equivalent to a shutterstock watermark.

air7 · on July 25, 2023

I heard of this (very neat) idea and gave it some thought. I think it can work very well in the short term. Perhaps OpenAI has already implemented this and can secretly detect long enough text created by GPT with high levels of accuracy.

However, as soon a detection tool becomes publicly available (or even just the knowledge that watermarking has been implemented internally), a simple enough garbling LLM would pop up that would only need to be smart enough to change words and phrasing here and there.

Of course these garbling LLMs could have a watermark of their own... So it might turn out to be a kind of cat-and-mouse game but with strong bias towards the mouse, as FOSS versions of garblers would be created or people would actually do some work manually, and make the changes by hand.

constantcrying · on July 25, 2023

There are already quite complex language models which can run on a CPU. Outside of the government banning personal LLMs, the chance of there not existing a working fully FOSS and open data rewrite model, if it becomes known that ChatGPT output is marked, seems very low.

The water marking techniques also can not work after some level of sophisticated rewriting. There simply will be no data encoded in the probabilities of the words.

motti · on July 25, 2023

If it's sophisticatedly rewritten then it's no longer AI generated

flangola7 · on July 26, 2023

That is not a reliable indicator even today. GPT-4 (not the ChatGPT RLHF one) is not distinguishable from human writing. You could ask it about modern events, but that's not a long term plan, and it could just make the excuse they don't follow the news.

concurrentsquar · on July 25, 2023

This, or cryptographic signing (like what the C2PA suggests) of all real digital media on the Earth are the only ways to maintain consensus reality (https://en.wikipedia.org/wiki/Consensus_reality) in a post-AI world.

I personally would want to live in Aaronson's world, and not the world where a centralized authority controls the definition of reality.

greiskul · on July 25, 2023

How can we maintain consensus reality, when it has never existed? There are a couple of bubbles of humanity where honesty and skepticism and valued. Everywhere else, at all moments of history, truth has been manipulated to subjugate people. Be it newspaper owned by polical families, priests, etc.

ummonk · on July 25, 2023

This would be trivially broken once sufficiently good open source pretrained LLMs become available, as bad actors would simply use unwatermarked models.

vunderba · on July 25, 2023

Even if you could force the bad actors to use this watermarked large language model, there's no guarantee that they couldn't immediately feed that through Langchain into a different large language model that would render all the original watermarks useless.

siglesias · on July 25, 2023

I'd challenge this assumption. ChatGPT is supposed to convey information and answer questions in a manner that is intelligible to humans. It doesn't mean it should write indistinguishably from humans. It has a certain manner of prose that (to me) is distinctive and, for lack of a better descriptor, silkier, more anodyne, than most human writing. It should only attempt a distinct style if prompted to.

constantcrying · on July 25, 2023

ChatGPT is explicitly trained on human writing it's training goal is explicitly to emulate human writing.

>It should only attempt a distinct style if prompted to.

There is no such thing as an indistinct style. Any particular style it could have would be made distinct by it being the style ChatGPT chooses to answer in.

The answers that ChatGPT gives are usually written in a style combining somewhat dry academic prose and the type of writing you might find in a Public Relations statement. ChatGPT sounds very confident in the responses it generates to the queries of users, even if the actual content of the information is quite doubtful. With some attention to detail I believe that it is quite possible for humans to emulate that style, further I believe that the style was designed by the creators of ChatGPT to make the output of the machine learning algorithm seem more trustworthy.

JustBreath · on July 25, 2023

That's true, you could even purposely inject fingerprinting into its writing style and it could still accomplish the goal of conveying information to people.

insanitybit · on July 25, 2023

All I would have to do is run the same tool over the text, see it gets flagged, and then modify the text until it no longer gets flagged. That's assuming I can't just prompt inject my way out of the scenario.

JustBreath · on Aug 8, 2023

That's true of virtually any detection tool, no?

"All I have to do is modify my virus until the anti-virus doesn't detect it."

littlestymaar · on July 25, 2023

But then that wouldn't be “detecting AI”, but merely recognizing an intentionally added fingerprint, which sounds far less attractive…

insanitybit · on July 25, 2023

I tried an experiment when GPT4 allowed for browsing. I sent it my website and asked it to read my blog posts, then to write a new blog post in my writing style. It did an ok job. Not spectacular but it did pick up on a few things (I use a lot of -'s when I write).

The point being that it's already possible to change ChatGPT's tone significantly. Think of how many people have done "Write a poem but as if <blah famous person> wrote it". The idea that ChatGPT could be reliably detected is kind of silly. It's an interesting problem but not one I'd feel comfortable publishing a tool to solve.

toss1 · on July 25, 2023

Yup.

Moreover, the way to deal with AI in this context is not like the way to deal with plagiarism; do not try to detect AI and punish its use.

Instead, assign it's use, and have the students critique the output and find the errors. This both builds skills in using a new technology, and more critically, builds the essential skills of vigilance for errors, and deeper understanding of the material — really helping students strengthen their BS detectors, a critical life skill.

npsomaratna · on July 26, 2023

Yes. Whether we like it or not, AI is with us to stay. A skill that AI can easily supplant is a skill that will become outdated very quickly. We're better off teaching students how to use AI effectively. Hopefully this will "future proof" them somewhat.

Teever · on July 25, 2023

Nitpick: ChatGPR is supposed to write in a way that is indistinguishable from a human, to another human.

That doesn't mean that it can't be distguishable by some other means.

CookieCrisp · on July 25, 2023

I think for small amounts of text there's no way around it being indistinguishable to a machine and not distinguishable to a human. There just aren't that many combinations of words that still flow well. Furthermore as more and more people use it I think we'll find some humans changing their speech patterns subconsciously more to mimic whatever it does. I imagine with longer text there will be things they'll be able to find, but, I think it will end up being trivial for others to detect what those changes are and then modifying the result enough to be undetectable.

jerf · on July 25, 2023

I think for this sort of problem it is more productive to think in terms of the amount of text necessary for detection, and how reliable such a detection would be, than a binary can/can't. I think similarly for how "photorealistic" a particular graphics tech is; many techs have already long passed the point where I can tell at 320x200 but they're not necessarily all there yet at 4K.

LLMs clearly pass the single sentence test. If you generate far more text than their window, I'm pretty sure they'd clearly fail as they start getting repetitive or losing track of what they've written. In between, it varies depending on how much text you get to look at. A single paragraph is pretty darned hard. A full essay starts becoming something I'm more confident in my assessment.

It's also worth reminding people that LLMs are more than just "ChatGPT in its standard form". As a human trying to do bot detection sometimes, I've noticed some tells in ChatGPT's "standard voice" which almost everyone is still using, but once people graduate from "Write a blog post about $TOPIC related to $LANGUAGE" to "Write a blog post about $TOPIC related to $LANGUAGE in the style of Ernest Hemmingway" in their prompts it's going to become very difficult to tell by style alone.

bob-09 · on July 25, 2023

If a human can't verify whether distinguished text is actually AI or not, detection will be full of false positives and ultimately unreliable.

leonardtang · on July 25, 2023

Precisely -- watermarks are an obvious example of this. To me, this is THE path forward for AI content detection.

WillPostForFood · on July 25, 2023

Watermarking text can't work 100% and will have false negatives and false positives. It is worse than nothing in many situations. It is nice when the stakes are low, but when you really need it you can't rely on it.

WiSaGaN · on July 26, 2023

The default style people cites about ChatGPT is also nothing intrinsic about AI, it is just this paticular AI is trained and prompted to output information in this way. The output style can change drastically with just a little prompt change even on the user side.

RandomLensman · on July 25, 2023

Why even care if it is written by a machine or not? I am not sure it matters as much as people think.

JohnFen · on July 25, 2023

There are a number of reasons people may care. For instance, the thing about art that appeals to me is that it's human communication. If it's machine generated, then I want to know so that I can properly contextualize it (and be able to know whether or not I'm supporting a real person by paying for it).

A world where I can't tell if something is made by human or by machine is a world that has been drained of something important to me. It would reduce the appeal of all art for me and render the world a bit less meaningful.

RandomLensman · on July 25, 2023

Fair, but I think that will shake out easier than expected: if there is a market (i.e. it is being valued) for certain things human generated people will work on being able to authenticate their output. Yes, there will likeky be fraud etc., but if there is a reasonable market it has a good chance of working because it serves all participants.

aleph_minus_one · on July 25, 2023

> Why even care if it is written by a machine or not? I am not sure it matters as much as people think.

You don't see the writing on the wall? OK, here is a big hint: it might make a huge difference from a legal perspective whether some "photo" showing child sexual abuse (CSA) was generated using a camera and a real, physical child, or by some AI image generator.

RandomLensman · on July 25, 2023

I don't think all jurisdictions make that distinction to start with and even if they did and societies really wanted to go there: not sure why a licensing regime on generators with associated cryptographic information in the images could not work. We don't have to be broadly permissive, if at all.

claudiawerner · on July 26, 2023

I agree with you, but in some jurisdictions the distance between stuff generated with AI and actual photographs of child abuse are treated rather closely; either way, possessing either could result in what the England & Wales calls a "sexual harm prevention order" (SHPO). To me the idea that someone could be served such an order without ever possessing real CSEM (or "child porn"), never mind actually never being near a child is rather worrying.

jdshaffer · on July 26, 2023

Well, I teach English as a second language in a non-English speaking country. I often used short essays and diary-writing for homework. The students have had lots of English input over the years, but not much experience with output. So, writing assignments work out very well for them. Alas, with ChatGPT on the rise here, they no longer have to write it themselves.

The upshot of which is, the useful writing assignments I used to give as homework will either have to be done in class (wasting valuable class time) or given up altogether (wasting valuable learning experiences).

aleph_minus_one · on July 26, 2023

> Well, I teach English as a second language in a non-English speaking country. I often used short essays and diary-writing for homework. The students have had lots of English input over the years, but not much experience with output. So, writing assignments work out very well for them. Alas, with ChatGPT on the rise here, they no longer have to write it themselves.

If your students want to betray themselves of the possible learning opportunities of attempting to formulate the sentences by themselves in English, it is their problem.

The same holds in mathematics (degree course): of course, in the first semesters, you can use a computer algebra system like Maple or Mathematica for computing the integrals on your exercise sheets, but you will betray yourself of the practice of computing integrals that these exercise sheets are supposed to teach you.

deadbeeves · on July 26, 2023

Is your objective that your students learn, or to police them? In the latter case, yes, you have those two options you mentioned. In the former case, you can just continue as you were. Some will cheat and some will not. The ones who do are only cheating themselves.

jdshaffer · on July 26, 2023

My goal is for them to learn. And yes, I can just carry on, and some will cheat. I've caught cheaters in the past, of course, but they were far and few between. With ChatGPT and even improved translators like DeepL, it's hard to get them to do the practice they need to learn.

And as a teacher who really WANTS them to learn and to get that feeling, "Hey, I can actually do this!", it's depressing to think of the one who do cheat themselves. Oh well...

deadbeeves · on July 26, 2023

Just tell them, "the point of this exercise is for you to practice writing in English, not for me to grade you. If you use ChatGPT to do it for you I won't be able to notice, but it will be pointless. It's better if you don't do it at all than if you do that."

lopolo333 · on July 26, 2023

Teens are very propense to succumb under peer pressure. Some would be honest and do the work themselves, but if a significant subgroup is using GPT there's a bigger chance that this group influences some of them to do so.

NoZebra120vClip · on July 26, 2023

What if they accompany the writings with a recording of them reading it aloud?

I think you could pick up right quick on who understood what they wrote, and who didn't.

jdshaffer · on July 26, 2023

That's actually a pretty good idea. I might have to try it selectively, though it'd be too much for 200 students submitting diaries and summaries every week. laugh

They once said I should join Line then we can all talk, then I asked if it's possible to talk in groups of 200+ and their eyes got really big.