The turing test is out of date. If procedural text generators perform this well ...

GuB-42 · on Nov 26, 2023

The real Turing test is not out of date. Two important considerations:

- The interrogator is against two players, a bot and a human, and can ask questions to both of them, both try to convince the interrogator that they are the human, and in the end, the interrogator must tell which one is which.

- The test is to be done by experts, both the human player and the interrogator are expected to play to win. The goal is not to have some chit-chat and try to guess afterwards, the goal is to actively try to find the bot, and come in well prepared.

For now, there is absolutely no way GPT-4 can pass a real Turing test. Remember, both the human player and the interrogator are trained in bot detection, and they collaborate. The only thing that is not allowed is for the two humans to know each other from before as to not use shared secrets against the bot. But using common knowledge anti-bot techniques together is fair game.

The thing described in the paper is a weak variant of the Turing test that only tests the ability of AI designers to trick unsuspecting humans.

awestroke · on Nov 26, 2023

Definition of “moving the goal posts”

sublinear · on Nov 26, 2023

I see this comment often and am confused why we wouldn't want to "move the goalposts".

I'd rather call it "passing a milestone" or simply "progress". More specifically it just means most criteria, especially the Turing test, are poorly defined.

marcusverus · on Nov 27, 2023

> I see this comment often and am confused why we wouldn't want to "move the goalposts". I'd rather call it "passing a milestone" or simply "progress".

You're literally moving the "moving the goalposts" goalpost to make it more palatable.

> More specifically it just means most criteria, especially the Turing test, are poorly defined.

Nonsense. The Turing test was widely accepted for the specific reason that it contained a concrete method of testing for Artificial Intelligence. Go look for a better-defined goalpost for intelligence. You'll find pile after steaming pile of meaningless philosophical hair-splitting, each so completely divorced from reality as to make it useless as a real-world measuring stick.

Turing's test was and is the best measure of 'intelligence'. LLMs have passed the test. They are intelligent. Narrowly so, and utterly without agency, but they're clearly intelligent. There's no need to move the goalposts so that we can feel better about our place in the universe.

camrobjones · on Dec 5, 2023

I actually agree that Turing's 1950 definition is pretty vague in places and there are a few different interpretations out there. As we discuss in the paper, it's also unclear what should constitute a pass in terms statistical analysis.

usrbinbash · on Nov 26, 2023

No it really isn't. The Turing test is just not an adequate methodology to determine intelligent behavior, and never was. This was already known way before generative ML models emerged.

https://en.wikipedia.org/wiki/Turing_test#Weaknesses

Just to pick my personal favorite, which is mentioned at the end of the article: This function right here can, technically, pass the Turing-Test:

    def generate_answer(in: str) -> str:
        return ""

How? Simple: A human can just chose not to respond to any question. So, if a program does exactly that, that is, meet every query with silence, how do you differentiate it from a human who does the same?

paulddraper · on Nov 26, 2023

> how do you differentiate it from a human who does the same

You're begging the question.

If a human gives the same answers, you can't differentiate.

But a human would not give the same answer (empty string) as your function.

usrbinbash · on Nov 26, 2023

> But a human would not give the same answer (empty string) as your function.

He doesn't have to. The simple fact that a human CAN do that is enough to make the determination in this case guesswork.

paulddraper · on Nov 26, 2023

A human could respond in only Monty Python quotes.

But that's not realistic.

---

The Turing test is determined by % of people who believe they are taking to a person.

usrbinbash · on Nov 27, 2023

> The Turing test is determined

What does the Turing Test determine anyway? According to the paper, it is supposed to measure a machines intelligence, or rather more prosaically, answer the question "Can machines think?"

But that isn't what the test measures.

It measures how well a machine can trick a human into believing it is a person. So instead of measuring how well the machine does, the test instead measures how well the human does. That is the greatest flaw of the Turing Test, and the little "answer with silence" thought experiment is showcasing exactly that flaw.

"Can a machine trick a human" and "Can a machine think" are 2 very different questions. Humans can, and have shown to, be tricked by ELIZA and even simpler chatterbots, engines that don't even use any kind of ML, just large bodies of prewritten text and a number of static rules.

> by % of people who believe they are taking to a person.

And what does that denote? Say I get 2 groups of people. One is tricked by ELIZA 80% of the time, the other is tricked by ELIZA 40% of the time. Does that show that ELIZA passed or failed? Neither. It shows that the outcome of the test depends as much, or even more, on the ability of the interrogator than it does on the quality of the machines responses.

Imagine a litmus test (a chemical test to roughly determine the acidity of a solution), where the test result depends on who performs it, as much or even more than it does on the quality of the Litmus-Paper. No lab would use that test for obvious reasons.

GaggiX · on Nov 26, 2023

Okay so Turing test but there is an actual conversation (I doubt in actual implementations like the one linked by OP the bot or the human can send empty messages) the Turing test is an adequate methodology?

WendyTheWillow · on Nov 26, 2023

That’s an absurd and overly strict interpretation of what Turing described. Stipulating cooperation between participants is precisely in the spirit of Turing’s original work.

usrbinbash · on Nov 26, 2023

> That’s an absurd and overly strict interpretation of what Turing described.

No, it isn't.

The turing test doesn't evaluate correctness of any answers, their sophistication, or even if there is an answer. All it evaluates is the ability of the interrogator to distinguish between the computer and the human.

And therein lies the greatest flaw of the test: It doesn't test the ability of the computer, it tests the ability of the interrogator.

Quote from the wikipedia article: https://en.wikipedia.org/wiki/Turing_test#

    In practice, the test's results can easily be dominated not by the
    computer's intelligence, but by the attitudes, skill, or naïveté of
    the questioner. Numerous experts in the field, including cognitive
    scientist Gary Marcus, insist that the Turing test only shows how easy
    it is to fool humans and is not an indication of machine intelligence.

And another quote:

    Chatterbot programs such as ELIZA have repeatedly fooled unsuspecting
    people into believing that they are communicating with human beings. In
    these cases, the "interrogators" are not even aware of the possibility
    that they are interacting with computers. To successfully appear human,
    there is no need for the machine to have any intelligence whatsoever and
    only a superficial resemblance to human behaviour is required

So the "silence program" may be an extreme case, but it showcases exactly this. If the computer simply says nothing, then what can the human do to determine it's a computer who is silent behind the curtain? And the answer is: Nothing. He can only guess. And since a person can just as easily be silent as a computer can, he might even mistake the human performer for a computer.

WendyTheWillow · on Nov 26, 2023

Yes, it's an objectively wrong interpretation of Turing's Imitation Game outlined in his paper, "Computing Machinery and Intelligence", published in Mind in 1940 [0]. It's literally on the first page:

> Now suppose X is actually A, then A must answer. It is A's object in the game to try and cause C to make the wrong identification.

A must answer.

[0] https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf

usrbinbash · on Nov 26, 2023

> A must answer

Oh really?

https://www.researchgate.net/publication/289523532_Taking_th...

WendyTheWillow · on Nov 26, 2023

Yes, really.

> practical Turing tests

Practical Turing tests.

Edit:

Here's the justification your paper uses to suggest the idea that Turing meant for the possibility of silence as a response:

> In one interpretation of Turing’s test the female is expected to tell the truth, but we are not far off that time when silence was preferred to the “jabbering” of women, because “speech was the monopoly of man” and that “sounds made by birds were part of a conversation at least as intelligible and intelligent as the confusion of tongues arising at a fashionable lady’s reception”.

Additionally, your cited paper there even admits this is a theoretical extension of The Imitation Game:

> In its standard form, Turing’s imitation game is described as an experiment that can be practicalized in two different ways (see Figure 1) (Shah, 2011):

    1) one-interrogator-one hidden interlocutor (Figure 1a),
    2) one-interrogator-two hidden interlocutors (Figure 1b).

> In both cases the machine must provide “satisfactory” and “sustained” answers to any questions put to it by the human interrogator (Turing, 1950: p.447). However, what about in the theoretical case when the machine takes the 5th amendment: “No person shall be held to answer”?1 Would we grant “fair play to the machines”?

To repeat in case you missed it when you clearly and definitely read your own citation: "In both cases the machine must provide “satisfactory” and “sustained” answers to any questions put to it by the human interrogator (Turing, 1950: p.447)."

usrbinbash · on Nov 27, 2023

I didn't miss anything. Your entire criticism so far hinges on my usage of silence as the answer.

Alright. I'll modify the function only slightly.

    return "I don't want to talk about this."

Replace that with a list of some different answers and `random.choice(answers)` if you like. Now you got a machine that gives "satisfactory and sustained" answers, only it always says No.

Aka. the exact same situation as with complete silence, only now we dotted the i's and crossed the t's.

And since the human is able to refuse to give any answers as well, it makes the entire test pointless, as again the interrogator cannot base his decision on anything but guesswork.

The point of the "silence-thought-experiment" isn't to satisfy Turings paper to the letter. The point is to showcase a flaw in the methodology it presents.

WendyTheWillow · on Nov 27, 2023

The “null response” isn’t a “satisfactory” answer as it doesn’t address the question. “Must answer” means the person under question must provide an answer to the question being asked. As I already said, your own citation proposes non-response as an extension of the Imitation Game, not a standard possible answer. Non-answers are not at all addressed by Turing in his work, because it’s not a possible outcome of the specific test he outlined.

It’s a weak thought experiment and from it one does not derive meaningful results, as it does not (and is not proposed to by anyone other than you) fit the original game’s intent. There are many other and better criticisms of the Turing Test.

Besides, you blindly cited a paper you yourself didn’t even read after repeated declarations of your own correctness at the expense of everyone else; I cannot think of a clearer example of “bad faith engagement.”

usrbinbash · on Nov 28, 2023

> “Must answer” means the person under question must provide an answer to the question being asked.

Yes, but it doesn't say what the answer has to be, it doesn't say it has to be correct, it doesn't say it has to have to do with the question.

> As I already said, your own citation proposes non-response

And I have shown to you why that doesn't matter in the slightest, because a very trivial modification to the methodology could achieve the exact same thing while following the original papers requirements to the letter.

> It’s a weak thought experiment

Wrong. It's a perfect demonstration of one of the many reasons why AI research is all but ignoring the Turing Test; the fact that the test is more about the interrogator than it is about the machine.

> “bad faith engagement.”

I don't agree with your statements, and have presented arguments why, that's not argueing in bad faith.

WendyTheWillow · on Nov 28, 2023

Your bad faith comes from how you disagreed with my statements; you did not do the necessary due diligence to demonstrate I should continue putting forward additional effort in both understanding your point and respecting your ideas.

For example, you reply "Wrong." to a subjective evaluation I've made. It literally cannot be wrong (though you can disagree), yet you declare it so with confidence! That's bad faith, and it means I will not engage further.

usrbinbash · on Nov 28, 2023

> you did not do the necessary due diligence to demonstrate

I did all the necessary due dilligence. I was perfectly aware that the paper used a variation on the imitation game. I also read Turings paper long before this discussion started (I think I was in highschool when I first stumbled upon it).

That's how I knew that it is easy to come up with basically the same thought experiment, without even changing any of the games rules.

> For example, you reply "Wrong." to a subjective evaluation I've made.

Because in my subjective evaluation it isn't a weak thought experiment, so I am fully within my rights to disagree with your evaluation.

sgt101 · on Nov 26, 2023

My dear brother in Christ,

Hayes and Ford : https://www.researchgate.net/publication/220813820_Turing_Te...

1995.

marcosdumay · on Nov 26, 2023

Do you believe it loses the spirit of the test?

There's an entire population out there convinced it's a human-level intelligence. If something deserves a passing grade, it's GPT.

gumballindie · on Nov 26, 2023

There are entire populations that believe the earth is flat, 5g causes viruses, and other sorts of crazy things. GPT being intelligent is one of them.

BobaFloutist · on Nov 26, 2023

I always saw the Turing test as kind of comparable to the Bechdel test. Updating it misses the entire point: there's a super simple and easily applicable thresh hold that, at least at the time it was established, a layman could use to soundly demolish most any example they came across.

It was never supposed to be an end-all-be-all, it was supposed to be a quick and dirty way of eliminating possibilities.

_a_a_a_ · on Nov 26, 2023

= "humans must win!". Goalposts accordingly moved.