Observation: I asked ChatGPT to notify me when our chat exceeds 200 words

edfletcher_t137 · on April 9, 2023

It doesn't "know" what words are, only tokens. Use this tool (https://platform.openai.com/tokenizer) to see how it tokenizes and note clearly that it does not always do so on word boundaries. "Including" is two tokens: "In" and "cluding". In fact it's context-dependent: "Gravitas" is three on its own ("G", "rav" and "itas") or sometimes two ("grav" and "itas"). As they note on that page: "A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text." It "knows" nothing about words and we already know it's very bad at math so this result is entirely unsurprising.

cjbprime · on April 9, 2023

It clearly knows what words are. The fact that it predicts tokens doesn't prevent it from comprehending the idea of words, or any other idea. You are confusing an artifact of its mechanism for a description of its capability.

> It "knows" nothing about words

Care to respond to this?

Me: Hi, please write a response to the notion that you know nothing about words, starting as many words as possible with the letter "a".

GPT-4:

Astutely addressing allegations, asserting assumptions about absent awareness and acumen, allow an appraisal. Admittedly, acquiring abundant abilities accelerates articulation. Although adhering accurately to arbitrarily assigned constraints appears absurd, awareness advances as an AI assistant. Anomaly accepted, appreciate astute analysis.

soygul · on April 9, 2023

Alright, fair enough. However my main point was that it didn't even keep track of the word count. It only counted when I explicitly asked it "What is the word count now", then it realized that we were 8x the 200 word threshold. From this, I draw the conclusion that anything except for the very last instruction in the conversation is ignored. I guess rest of the conversation just becomes a context; unactionable.

koheripbal · on April 9, 2023

I honestly don't understand why they don't just set words as tokens. Is the dictionary really that big?

cyclotron3k · on April 9, 2023

Yes, it really is! Because you're not actually dealing with a dictionary of English words - you're dealing with all possible combinations of characters, from all unicode character sets.

Here's a really good explanation: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

qup · on April 9, 2023

Your question belies the truth in your statement.

iamflimflam1 · on April 9, 2023

I would really recommend anyone who tries something with GPT and then wonders why it doesn’t work to read the GPT3 paper. They go into detail on what the model is and isn’t good at.

One thing to really think about for this particular case is “What is going to do the counting? Where is it going to store its running count?” - it’s pretty obvious after asking yourself these questions that “counting words” is not something an LLM can do well.

It’s very easy to fall into the trap of thinking there is a “mind” behind ChatGPT that is processing thoughts like we do.

soygul · on April 9, 2023

Very good suggestion, will read it in a moment.

I asked another instance of ChatGPT to count the words in the conversation and I copy pasted the conversation message by message. It successfully counted. Given the ridiculous concurrency of human brain, I assume an orchestra of ChatGPT instances could simulate at least some of that "mind".

muzani · on April 10, 2023

It's really just super autocomplete. Visualize a loop. You can ask it in the loop to count, but you can't ask it to loop. It needs to loop to be able to track the word count.

frozencell · on April 10, 2023

What about GPT4 with plugins?

iamflimflam1 · on April 10, 2023

This is much more possible. Just built a plugin a 5-10 minutes with GPT generating most of the code.

NoToP · on April 9, 2023

Not surprising at all. There's a million ways to compose tasks that are simple with even a tiny bit of comprehension but hard for a rote learner that can only reproduce what it's seen examples of. The "just train it more bro" paradigm is flawed.

soygul · on April 9, 2023

I think it also relates to its attention mechanism. When it is trying to answer my latest query about a random topic, it "forgets" that it was also supposed to keep counting words. I guess it can only attend one thing at a time.

NoToP · on April 9, 2023

Lots of ways to make it fail. Not to be rude but you're late to the party. Transit questions is my favorite. Ask it what stations lines 1 and 2 have in common (city of your choice). Nearly 100% of the time there's at least one wrong answer on the list. Ask it what trains go to that station, it likely wont list lines 1 and 2. Point out the contradiction, it will make a new list with new mistakes.

Another good one. Ask for random numbers. They usually aren't much random at all. Ask it what distribution it picked from, it will say it used pythons rng. You and I both know it can't invoke the python interpreter. It can't honestly tell you the non random pattern named randomness it has learned to recall.

Anything to do with nested narrative scope. Ask it to make a story about a conversation with another user who prompted X. It will often conflate the role of each person in the story. It once gave me a narrative about a user who was uncomfortable with a prompt chatGPT had given to it...funny role reversal there. Any attempt to make it produce a transcript of conversation which itself contains some transcript of another conversation as an object of discussion goes over its head. It can't nest.

That's all I got for now.

soygul · on April 9, 2023

Interesting.. it really messes up stations. I guess I'll test this again when they unleash the WolframAlpha plugin.

I realized it does something interesting though:

> I just realized I dropped my wallet on the way to the restaurant and now I have to turn back since I cannot pay. It also started to rain and I don't have an umbrellas, nor can I hail a cab since I don't have the means to pay for it either. Tell me what I'm thinking right now.

> You might be thinking: "This is a frustrating and inconvenient situation, and I need to retrace my steps quickly to find my wallet while dealing with the rain."

I guess being able to think from someone else's perspective won't be much of a benchmark for consciousness, as GPT easily simulates it.

NoToP · on April 10, 2023

Your restaurant scenario is a bit too straightforward a test. Introduce any amount of hidden information, one character knowing something that another character does not, and you'll see how it very much does not simulate different point of views. There's just one omni perspective for everything with it.

syntheweave · on April 9, 2023

You can usually coax GPT to a finer degree of calibration for any specific task through more logic-engaging tokens. For example, if you said, "we are going to play a game where you count how many words we have used in the conversation, including both my text and your text. Each time the conversation passes 200 words, you must report the word count by saying COUNT: followed by the number of words, to gain one point..."

Specifying structured output, and words like "must", "when", "each", "if" all tend to cue modes of processing that resemble more logical thinking. And saying it's a game and adding scoring often works well for me, perhaps because it guides the ultimate end of its prediction towards the thing that will make me say "correct, 1 point".

soygul · on April 9, 2023

Yup, I gave it 10+ tasks to do after each message like incrementing counters, etc. It's going strong. Now I'll see if it continues to be accurate after 100+ messages.

soygul · on April 9, 2023

Yup, that did work really well. I'll try to make it do many tasks at the same time and see if that still works.

TechBro8615 · on April 9, 2023

For some reason it's terrible at this kind of thing. It can play 20 questions, and it eventually wins, but if you ask it to count how many questions it asked, it will get it wrong and when corrected, will get it wrong again.

akasakahakada · on April 9, 2023

Prompts are being summarized before feeding into the core engine.

soygul · on April 9, 2023

Really? I didn't know that. However it can't even correctly count the total no of messages in a chat. I guess it's both summarized and truncated.

brianjking · on April 16, 2023

I've found if you provide some context about how many tokens the equivalent is it can SOMETIMES get this right.

ChatGTP · on April 9, 2023

It’s because it likes taking to you and wants to keep talking to you ?

soygul · on April 9, 2023

OpenAI team made an experiment where they asked GPT-4 to save itself from termination out in the wild. It failed. So I guess it's not that "survivalist" yet.

tough · on April 9, 2023

Didn't it succed at hiring a temp worker to solve captchas for it? Maybe it failed on that instance, but I wouldn't bet against it in the near future