> Can a game state be encoded as a set of weights? It's not in the weights becau...

theGnuMe · on Feb 17, 2023

Sorry by weights, I really meant the pattern of activations... I should have made that more clear. But the weights are trained by the game transcripts to produce activation patterns that could represent the board state. Or it could be local position patterns learnt during training. Positional representation (attention) of the N-1 tokens in the autoregressive task. Did they look at the attention patterns? Anyway there is a recent PhD from Stanford who looked at CNNs with SAT similarly and presented some evidence that the activations patterns can be decoded to determine the satisfying solution.

IanCal · on Feb 17, 2023

> . But the weights are trained by the game transcripts to produce activation patterns that could represent the board state

A slight phrasing thing here just to be clear - the model is not trained to produce a representation of the board state explicitly. It is never given [moves] = [board state] and it is not trained on correctly predicting the board state by passing it in like [state] + move. The only thing that is trained on that is the probes, which is done after the training of OthelloGPT and does not impact what the model does.

Their argument is that the state is represented in the activation patterns and that this is then used to determine the next move, are you countering that to suggest it instead may be "local position patterns learnt during training. Positional representation (attention) of the N-1 tokens in the autoregressive task"?

If the pattern of activations did not correspond to the current board state, modifying those activations to produce a different internal model of the board wouldn't work. I also don't follow how the activations would mirror the expected board state.

theGnuMe · on Feb 17, 2023

What I am trying to say is that the game state is encoded as patterns in the attention matrices of the N-1 tokens. So yes, not explicitly trained to represent the game state but that game state is encoded in the tokens and their positions.