It's quite absurd to assert that ELIZA from 1966 outperformed GPT-3.5. Sure, it ...

twiss · on Nov 26, 2023

The results are from 2023, not 1966. The fact that ELIZA (indeed surprisingly) did better than GPT-3.5 at fooling people into thinking it's human is discussed in Section 4.4, "The ELIZA effect".

If you look at the examples in Appendix C, it seems that it's because ELIZA didn't match participants' current expectations of AI's behavior, and thus they thought it must be human:

    Verdict: Human | Confidence: 50
    Reason: hard to believe anyone would
    purposefully make an AI this bad

    Verdict: Human | Confidence: 70
    Reason: doesn't respond to
    adversarial attacks

    Verdict: Human | Confidence: 72
    Reason: Super erratic

Interestingly, this means that if you want to fool humans today, it might be more important to make an AI that's different from the ones in common use, rather than strictly better.

So yes, I agree the Turing Test tells us as much about human expectations as about AIs, and the researchers also acknowledge this.

cortesoft · on Nov 26, 2023

The Turing test only makes sense if you compare it against an actual human, otherwise everyone taking the test could just say it's an AI no matter what and no AI could ever pass.

The key is having people guess the AI is an AI at the same rate people guess a human is an AI.

quickthrower2 · on Nov 26, 2023

Easy to detect AI, ask it "what is your favourite football team"

codetrotter · on Nov 26, 2023

I don't have personal preferences, but I'm knowledgeable about various football teams! Do you have a favorite team?

practice9 · on Nov 26, 2023

It was trained to respond like that to not alienate groups of people. But fine-tuned and with another pre-prompt, it would give absolutely different answer.

codetrotter · on Nov 26, 2023

That's right! My aim is to be impartial and respectful to everyone's preferences. If you'd like, I can discuss or provide information about any specific football team you're interested in!

twelve40 · on Nov 26, 2023

i'd probably fail that. I really don't have one.

argiopetech · on Nov 26, 2023

"American, or soccer?" usually gets a chuckle and we move on...

quickthrower2 · on Nov 27, 2023

The one involving a foot and a ball. Yeah OK, AF kick the ball too and soccer uses knees, thighs, chins, head, chest, ass, etc. The one predominately using a foot to impart energy onto a sphere.

argiopetech · on Nov 27, 2023

Well, if you were my audience I might say, "Football, or hand egg?" :)

quickthrower2 · on Nov 27, 2023

You sound pretty human. Assistants don't talk like that.

lacrimacida · on Nov 26, 2023

The passing of turning test is a moving goalpost, it won’t matter if a large group of people get initially fooled by a program, the passing program should continue to convince them it’s a human even after they’re told it’s not. The cracks should not be discernable long after the program passed the Turing test to ensure no gimmick was used to its aid.

lelag · on Nov 26, 2023

The Turing Test in the past: can it look smart enough to be a human. The Turing Test soonish: can it look dumb enough to be a human.

Retric · on Nov 26, 2023

ChatGPT is still a long way from human level responses across a full conversation.

You just need to understand their limitations. Playing 20 questions is much harder for an LLM than summarizing a technical article. Where for people, young kids can play 20 questions easily but summarizing a technical paper would be challenging.

numpad0 · on Nov 26, 2023

It's a giant GAN setup. Generator is improved by feedback from humans. Humans, studying generators, improve too - a game from a decade ago always look low resolution.

zopa · on Nov 26, 2023

> It's quite absurd to assert that ELIZA from 1966 outperformed GPT-3.5.

They don't. The researchers ran a modern ELIZA implementation alongside the GPT models. They're not citing ELIZA results from 1966.

Cacti · on Nov 26, 2023

The point of the Turing Test is widely misunderstood, and not to pick on you, but your point is a perfect example. The Test is very explicitly NOT about what is on the other side of the curtain, but rather _how do we know_ what is on the other side. Turings point being of course that if you remove all trace “human” things from an interaction and reduce it only the barest minimum of things that we need to mathematics in our universe, (eg ZFC + choice) then not only can we not distinguish if something is an AI, we can’t distinguish if it’s human! Or even conscious. And furthermore, it is not clear we could tell while also retaining mathematical consistency. Which is a much deeper issue about us. What is on the other side of the curtain is actually irrelevant to the point he was making, and his argument is not dependent on it. It could be a baboon. Or a rock. Or a black hole. It doesn’t matter. What matters is _can we tell if it is not human_. It turns out this happens to be not all that different that asking whether an AI is human, but for reasons unrelated to the fact that it is an AI.

I find if you explain to people that the turing test, an uncountability argument, and the chinese room problem, are all equivalent statements of the same thing, that it is much easier to grasp the point Turing was making.

Turings point only depends on us, human beings. So long as we are around and still human, the Test will remain highly relevant.

razodactyl · on Nov 26, 2023

What color is this curtain?

Dave_Rosenthal · on Nov 26, 2023

I completely agree.

I remember a similar discussion about special effects in movies. The year was 1993 and I was telling my uncle that the newly released Jurassic Park had special effects that looked completely real. My uncle was an artist (a painter) and told me that he agreed they looked real to us now, but that they probably wouldn't in the future. That concept seemed crazy to me... He explained that when he saw the original Star Wars in theaters the special effects were mind blowing to the audience and looked completely believable and real. Of course, to me, at the time, Star Wars special effects looked crude and fake--I had a hard time believing him. But, if I watch Jurassic Park today, sure enough, he was right.

dragonwriter · on Nov 26, 2023

> I don't think the Turing Test is teaching us anything about AI system capabilities.

Sure, it is.

A system only really passes the Turing test (you might call this the "focused Turing test") to the degree it passes the regular Turing test when taken by people whose experience of AI systems matches the system being evaluated.

That is, when someone who has experience with humans and that kind of AI system. who knows specifically that they are looking to distinguish humans from that kind of AI system, still cannot do so better than chance.

Anything else and the system can be distinguished by humans from human interactions, even it gets by because human expectations for the particular tests are primed in a way which has them looking the wrong way.

alexey-salmin · on Nov 26, 2023

> I would bet good money that if GPT-3.5 could have magically interacted with those 1966 participants, it would have fooled most of them, as it would have been inconceivable for a computer to exhibit such capabilities then.

You cannot just "fool" someone in the Turing test, the interrogator knows one of the two partners is a computer. To pass you need to perform better than your human companion.

Whether the interrogator knows of the existence of advanced auto-complete systems is not very important in this setup. He knows of existence of fellow humans and needs to identify one when he meets him.

dehrmann · on Nov 26, 2023

My other gripe with the Turing Test is it doesn't speak to understanding, intelligence, or sentience. It's more of a milestone than something that actually measures AI's capabilities.

mike_hock · on Nov 26, 2023

Tell that to the whackos who believe that ChatGPT is self-aware just because it has been fed lots of training data that describe it as an AI and its purpose in detail.

razodactyl · on Nov 26, 2023

I believe ChatGPT is "self-aware" in the sense that it can distinguish itself in a conversation. I don't believe it to be aware in a conscious sense. How strict are the definitions?

arbitrage · on Nov 26, 2023

Did you read the linked article or abstract?

> Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception.

lelag · on Nov 26, 2023

Everyone's grandma has heard about ChatGPT by now. My hairdresser told me she uses it. You can bet that no participant in the original study had heard about computer software capable of simulating a conversation, let alone ELIZA itself which had just been invented.

What I get from this is that the zeitgeist is sufficient to change such a study result as the expectation of what AI can do is there, regardless of if you are familiar with LLM, or your education level.