Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is METR?



The 2h 15m is the length of tasks the model can complete with 50% probability. So longer is better in that sense. Or at least, "more advanced" and potentially "more dangerous".



To maybe save others some time METR is a group called Model Evaluation and Threat Research who

> propose measuring AI performance in terms of the length of tasks AI agents can complete.

Not that hard to figure out but the way people refer were referring to them made me think it stood for an actual metric.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: