For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pu...

		lsorber 6 months ago \| parent \| context \| favorite \| on: Reinforcement learning, explained with a minimum o... For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy: https://github.com/superlinear-ai/microGRPO The implementation learns to play Battleship in about 2000 steps, pretty neat!