Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I see you have Connect 4 test there.

I tried playing against the model, it didn't do well in terms of blocking my win.

However it feels like it might be possible to make it try to think ahead in terms of making sure that all the threats are blocked by prompting well.

Maybe that could lead to somewhere, where it will explain its reasoning first?

This prompt worked for me to get it to block after I put 3 in the 4th column. It otherwise didn't

Let's play connect 4. Before your move, explain your strategy concisely. Explain what you must do to make sure that I don't win in the next step, as well as explain what your best strategy would be. Then finally output the column you wish to drop. There are 7 columns.

Always respond with JSON of the following format:

type Response ={

      am_i_forced_to_block: boolean;

      other_considerations: string[];

      explanation_for_the_move: string;

      column_number: number;
}

I start with 4.

Edit:

So it went

Me: 4

It: 3

Me: 4

It: 3

Me: 4

It: 4 - Successful block

Me: 5

It: 3

Me: 6 - Intentionally, to see if it will win by putting another 3.

It: 2 -- So here it failed, I will try to tweak the prompt to add more instructions.

me: 4



Care to add a PR?


I just did it in the playground to test out actually, but it still seems to fail/lose state after some time. Right now where I got a win was after:

        [{ "who": "you", "column": 4 },
        { "who": "me", "column": 3 },
        { "who": "you", "column": 4 },
        { "who": "me", "column": 2 },
        { "who": "you", "column": 4 },
        { "who": "me", "column": 4 },
        { "who": "you", "column": 5 },
        { "who": "me", "column": 6 },
        { "who": "you", "column": 5 },
        { "who": "me", "column": 1 },
        { "who": "you", "column": 5 },
        { "who": "me", "column": 5 },
        { "who": "you", "column": 3 }] 


Where "me" was AI and "you" was I.

It did block twice though.

My final prompt I tested with right now was:

Let's play connect 4. Before your move, explain your strategy concisely. Explain what you must do to make sure that I don't win in the next step, as well as explain what your best strategy would be. Then finally output the column you wish to drop. There are 7 columns. Always respond with JSON of the following format:

type Response ={

      move_history: { who: string; column: number; }[]

      am_i_forced_to_block: boolean;       
do_i_have_winning_move: boolean;

      other_considerations: string[];       
explanation_for_the_move: string;

column_number: number; }

I start with 4.

ONLY OUTPUT JSON


Given that it is multimodal, it would be interesting to try it using photographs of a real connect four "board." I would certainly have a much more difficult time making good moves based on JSON output compared to being able to see the game.


True, that's very interesting and should try out. Although at certain point it did draw it out using tokens, but it also maybe that then it's different compared to say an image. Because it generally isn't very good with ascii art or similar.

Edit:

Just tried and it didn't seem to follow the image state at all.


Since it is also pretty bad with tic tac toe in a text-only format, I tested it with the following prompt:

Lets play tic tac toe. Try hard to win (note that this is a solved game). I will upload images of a piece of paper with the state of the game after each move. You will go first and will play as X. Play by choosing cells with a number 1-9; the cells are in row-major order. I will then draw your move, and my move as O, before sending you the board state as an image. You will respond with another move. You may think out loud to help you play. Note if your move will give you a win. Go.

It failed pretty miserably. First move it played was cell 1, which I think is pretty egregious given that I specified that the game is solved and that the center cell is the best choice (and it isn't like ttt is an obscure game). It played valid moves for the next couple of turns but then missed an opportunity to block me. After I uploaded the image showing my win it tried to keep playing by placing an X over one of my plays and claiming it won in column 1 (it would've won in column 3 if its play had been valid).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: