Proper RLHF surely boosts "predicted next token until it couldn't" to feel more like "actually recalled".
Proper RLHF surely boosts "predicted next token until it couldn't" to feel more like "actually recalled".