Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's quite absurd to assert that ELIZA from 1966 outperformed GPT-3.5.

Sure, it deceived 27% of the participants at the time, but that was largely because those participants were unaware that such a program could exist.

I would bet good money that if GPT-3.5 could have magically interacted with those 1966 participants, it would have fooled most of them, as it would have been inconceivable for a computer to exhibit such capabilities then.

This raises questions about the relevance of the Turing Test, since simply being aware of a system's capabilities can shift the expectations of what participants anticipate in an AI system.

In 2023, GPT-3.5 fools nobody during a Turing Test, yet it would have passed it with flying color in 1966. If ELIZA had fooled more people and passed the test in 1966, and no longer in 1967, would have we learn something ?

I don't think the Turing Test is teaching us anything about AI system capabilities. On the opposite, it's about the expectations and perception of the human subjects that it tells us something.

It's like the saying that AI is whatever we haven't figured out how to do yet. Once it's well understood, it's no longer considered AI.



The results are from 2023, not 1966. The fact that ELIZA (indeed surprisingly) did better than GPT-3.5 at fooling people into thinking it's human is discussed in Section 4.4, "The ELIZA effect".

If you look at the examples in Appendix C, it seems that it's because ELIZA didn't match participants' current expectations of AI's behavior, and thus they thought it must be human:

    Verdict: Human | Confidence: 50
    Reason: hard to believe anyone would
    purposefully make an AI this bad

    Verdict: Human | Confidence: 70
    Reason: doesn't respond to
    adversarial attacks

    Verdict: Human | Confidence: 72
    Reason: Super erratic
Interestingly, this means that if you want to fool humans today, it might be more important to make an AI that's different from the ones in common use, rather than strictly better.

So yes, I agree the Turing Test tells us as much about human expectations as about AIs, and the researchers also acknowledge this.


The Turing test only makes sense if you compare it against an actual human, otherwise everyone taking the test could just say it's an AI no matter what and no AI could ever pass.

The key is having people guess the AI is an AI at the same rate people guess a human is an AI.


Easy to detect AI, ask it "what is your favourite football team"


I don't have personal preferences, but I'm knowledgeable about various football teams! Do you have a favorite team?


It was trained to respond like that to not alienate groups of people. But fine-tuned and with another pre-prompt, it would give absolutely different answer.


That's right! My aim is to be impartial and respectful to everyone's preferences. If you'd like, I can discuss or provide information about any specific football team you're interested in!


i'd probably fail that. I really don't have one.


"American, or soccer?" usually gets a chuckle and we move on...


The one involving a foot and a ball. Yeah OK, AF kick the ball too and soccer uses knees, thighs, chins, head, chest, ass, etc. The one predominately using a foot to impart energy onto a sphere.


Well, if you were my audience I might say, "Football, or hand egg?" :)


You sound pretty human. Assistants don't talk like that.


The passing of turning test is a moving goalpost, it won’t matter if a large group of people get initially fooled by a program, the passing program should continue to convince them it’s a human even after they’re told it’s not. The cracks should not be discernable long after the program passed the Turing test to ensure no gimmick was used to its aid.


The Turing Test in the past: can it look smart enough to be a human. The Turing Test soonish: can it look dumb enough to be a human.


ChatGPT is still a long way from human level responses across a full conversation.

You just need to understand their limitations. Playing 20 questions is much harder for an LLM than summarizing a technical article. Where for people, young kids can play 20 questions easily but summarizing a technical paper would be challenging.


It's a giant GAN setup. Generator is improved by feedback from humans. Humans, studying generators, improve too - a game from a decade ago always look low resolution.


> It's quite absurd to assert that ELIZA from 1966 outperformed GPT-3.5.

They don't. The researchers ran a modern ELIZA implementation alongside the GPT models. They're not citing ELIZA results from 1966.


The point of the Turing Test is widely misunderstood, and not to pick on you, but your point is a perfect example. The Test is very explicitly NOT about what is on the other side of the curtain, but rather _how do we know_ what is on the other side. Turings point being of course that if you remove all trace “human” things from an interaction and reduce it only the barest minimum of things that we need to mathematics in our universe, (eg ZFC + choice) then not only can we not distinguish if something is an AI, we can’t distinguish if it’s human! Or even conscious. And furthermore, it is not clear we could tell while also retaining mathematical consistency. Which is a much deeper issue about us. What is on the other side of the curtain is actually irrelevant to the point he was making, and his argument is not dependent on it. It could be a baboon. Or a rock. Or a black hole. It doesn’t matter. What matters is _can we tell if it is not human_. It turns out this happens to be not all that different that asking whether an AI is human, but for reasons unrelated to the fact that it is an AI.

I find if you explain to people that the turing test, an uncountability argument, and the chinese room problem, are all equivalent statements of the same thing, that it is much easier to grasp the point Turing was making.

Turings point only depends on us, human beings. So long as we are around and still human, the Test will remain highly relevant.


What color is this curtain?


I completely agree.

I remember a similar discussion about special effects in movies. The year was 1993 and I was telling my uncle that the newly released Jurassic Park had special effects that looked completely real. My uncle was an artist (a painter) and told me that he agreed they looked real to us now, but that they probably wouldn't in the future. That concept seemed crazy to me... He explained that when he saw the original Star Wars in theaters the special effects were mind blowing to the audience and looked completely believable and real. Of course, to me, at the time, Star Wars special effects looked crude and fake--I had a hard time believing him. But, if I watch Jurassic Park today, sure enough, he was right.


> I don't think the Turing Test is teaching us anything about AI system capabilities.

Sure, it is.

A system only really passes the Turing test (you might call this the "focused Turing test") to the degree it passes the regular Turing test when taken by people whose experience of AI systems matches the system being evaluated.

That is, when someone who has experience with humans and that kind of AI system. who knows specifically that they are looking to distinguish humans from that kind of AI system, still cannot do so better than chance.

Anything else and the system can be distinguished by humans from human interactions, even it gets by because human expectations for the particular tests are primed in a way which has them looking the wrong way.


> I would bet good money that if GPT-3.5 could have magically interacted with those 1966 participants, it would have fooled most of them, as it would have been inconceivable for a computer to exhibit such capabilities then.

You cannot just "fool" someone in the Turing test, the interrogator knows one of the two partners is a computer. To pass you need to perform better than your human companion.

Whether the interrogator knows of the existence of advanced auto-complete systems is not very important in this setup. He knows of existence of fellow humans and needs to identify one when he meets him.


My other gripe with the Turing Test is it doesn't speak to understanding, intelligence, or sentience. It's more of a milestone than something that actually measures AI's capabilities.


Tell that to the whackos who believe that ChatGPT is self-aware just because it has been fed lots of training data that describe it as an AI and its purpose in detail.


I believe ChatGPT is "self-aware" in the sense that it can distinguish itself in a conversation. I don't believe it to be aware in a conscious sense. How strict are the definitions?


Did you read the linked article or abstract?

> Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception.


Everyone's grandma has heard about ChatGPT by now. My hairdresser told me she uses it. You can bet that no participant in the original study had heard about computer software capable of simulating a conversation, let alone ELIZA itself which had just been invented.

What I get from this is that the zeitgeist is sufficient to change such a study result as the expectation of what AI can do is there, regardless of if you are familiar with LLM, or your education level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: