Benchmarks are always fishy, you need to look at things that you'd use the model...

lancebeet · 2026-02-20T16:16:49 1771604209

If benchmarks are fishy, it seems their bias would be to produce better scores than expected for proprietary models, since they have more incentives to game the benchmarks.

coder543 · 2026-02-20T14:32:52 1771597972

No... benchmarks are not always "fishy." That is just a defense people use when they have nothing else to point to. I already said the benchmarks aren't perfect, but they are much better than claiming vibes are a more objective way to look at things. Yes, you should test for your individual use case, which is a benchmark.

As I said, I have been following this stuff closely for many years now. My opinion is not informed just by looking at a single chart, but by a lot of experience. The chart is less fishy than blanket statements about the closed models somehow being way better than the benchmarks show.