Not the parent, but, it's kinda obvious in even degenerate logic cases:
# user
what's 78904 + 34567282902
# Gemma 2.5 pro
Identify the core task: The user wants to add two numbers: 78904 and 34567282902.
Recognize the operation: The operation is addition (+).
Perform the calculation:
Align the numbers vertically by place value:
34567282902
+ 78904
-------------
Add column by column, starting from the rightmost column (ones place):
2 + 4 = 6
0 + 0 = 0
9 + 9 = 18 (write down 8, carry over 1)
2 + 8 + 1 (carry-over) = 11 (write down 1, carry over 1)
8 + 7 + 1 (carry-over) = 16 (write down 6, carry over 1)
2 + 1 (carry-over) = 3
7 + 0 = 7
6 + 0 = 6
5 + 0 = 5
4 + 0 = 4
3 + 0 = 3
Combine the results: 34567361806
Format the answer: Present the calculation clearly and state the final result.
To find the sum of 78904 and 34567282902, you add the two numbers together:
34567282902
+ 78904
--------------
34567361806
So, 78904+34567282902=34567361806.
The model definitely iteratively built up (useful and correct even) text that wasn't directly in the training data (I doubt that number ever appeared anywhere in there given that 34567361806 doesn't appear in any searches I do of various engines)
> The model definitely iteratively built up (useful and correct even) text that wasn't directly in the training data
The text is highly likely in training data, as it’s textbook arithmetic instructions. It’s the number that is probably not there. Simple arithmetic is one of the verifiable operation types (truths) with a straightforward reward function used to train CoT models. In your example, what’s interesting to me is improving LLM inference with RL that can result in such wonderful outcomes, but that’s perhaps a different question.
# user
what's 78904 + 34567282902
# Gemma 2.5 pro
To find the sum of 78904 and 34567282902, you add the two numbers together: + 78904 -------------- 34567361806So, 78904+34567282902=34567361806.
The model definitely iteratively built up (useful and correct even) text that wasn't directly in the training data (I doubt that number ever appeared anywhere in there given that 34567361806 doesn't appear in any searches I do of various engines)