Location: New York, NY
Remote: No preference as long as you're located in NYC
Willing to relocate: No
Technologies: Next, Svelte, Postgres, Redis, Docker, Kubernetes
Resume: https://drive.google.com/file/d/1HM6dJ7QVh7n4OJ2RXoRk60-T51E...
Email: plum@plumocracy.com
The numbers they show don't matter. "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6.", but what did anthropic do? They just stopped showing the benchmark altogether and then just show the cherry top ones that got improved on.