Hmm...
The key is to successfully decompose a big, hard problem into easier atomic sub-problems. However, the decomposition process itself is difficult, and this paper is not about that. They decompose a task using a human-written prompt.
Prompts are really an interesting way of programming, and we can actually express logic containing abstract adjectives like ‘happy’ and ‘unsatisfied’ in a somewhat arbitrary way.
"As an analogy, imagine that you could put your dog or cat into hibernate mode whenever you left on a trip. Your dog or cat might not notice, but even if they did, they might not mind. Now imagine that you could put your child into hibernate mode whenever you were too busy to spend time with them. Your child would absolutely notice, and even if you told them it was for their own good, they would make certain inferences about how much you valued them. That’s the situation the human characters in the story find themselves in."
Fascinating.
"Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data."
Really want to see the number of training pairs needed to achieve this socre. If it only takes a few pairs, say 100 pairs, I would say it is amazing!
Trained with 300 raw pairs directly from the ARC training set without using any data augmentation process, such as generating many more pairs with some kind of ARC generator? That's amazing.