I think you are overthinking it a little bit. Don't forget the 'you' preamble is...

kqr · on May 13, 2023

What GP is saying is that virtually no documents are structured like that, so "2 matey" is not a reasonable prediction, statistically speaking, from what came before.

The answer has been given in another comment, though: while such document virtually non-existent in the wild, they are injected into the training data.

criley2 · on May 13, 2023

I do not think this is true. The comment above said they generate documents to teach the model about the second person, not that they generate documents including everything possible including "do math like a pirate". The internet and other human sources populate the maths and pirate parts.

kqr · on May 13, 2023

You're right! I was talking only about the structure of the document, in particular, providing context in second person.

m3kw9 · on May 13, 2023

They don’t need to be as the model knows what a calculator and a pirate is in separate docs. While I don’t know how the weights work but they definitely are not storing docs traditionally, but rather seem to link to become a probability model