Possible it's in the training set then?

unbrice · on May 13, 2024

Authors note that this is probably the case:

> we wanted to verify whether the model is actually capable of reasoning by building a simulation for a much simpler game - Connect 4 (see 'llmc4.py'). > When asked to play Connect 4, all LLMs fail to do so, even at most basic level. This should not be the case, as the rules of the game are simpler and widely available.

bongodongobob · on May 13, 2024

Wouldn't there have to be historical matches to train on? Tons of chess games out there but doubt there are any connect 4 games. Is there even official notation for that?

My assumption is that chatgpt can play chess because it has studied the games rather than just reading the rules.

mewpmewp2 · on May 13, 2024

Good point, would be interesting to have one public dataset and one hidden as well, just to see how scores compare, to understand if any of it might actually have got to a dataset somewhere.

freediver · on May 13, 2024

I'd be quite surprised if OpenAI took such a niche and small dataset into consideration. Then again...

mewpmewp2 · on May 13, 2024

I would assume it goes over all the public github codebases, but no clue if there's some sort of filtering for filetypes, sizes or amount of stars on a repo etc.