Ok I had time to read through this, and yeah I agree, multicore test should not be waiting on so much shared state.
There are examples of programs that aren't totally parallel or serial, they'll scale to maybe 6 cores on a 32-core machine. But there's so much variation in that, idk how you'd pick the right amount of sharing, so the only reasonable thing to test is something embarassingly parallel or close. Geekbench 6's scaling curve is way too flat.
The purpose of a multi-core benchmark is that if you throw a lot of threads at something, it can move where the bottleneck is. With one thread neither a desktop nor HEDT processor is limited by memory bandwidth, with max threads maybe the first one is and the second one isn't. With one thread everything is running at the boost clock, with max threads everything may be running at the base clock. So the point of distinguishing them is that you want to see to what extent a particular chip stumbles when it's fully maxed out.
But tanking the performance with shared state will load up the chip without getting anything in return, which isn't even representative of the real workloads that use an in-between number of threads. The 6-thread consumer app isn't burning max threads on useless lock contention, it just only has 6 active threads. If you have something with 32 cores and 64 threads and it has a 5GHz boost clock and a 2GHz base clock, it's going to be running near the boost clock if you only put 6 threads on it.
It's basically measuring the performance you'd get from a small number of active threads at the level of resource contention you'd have when using all the threads, which is the thing that almost never happens in real-world cases because they're typically alternatives to each other rather than things that happen at the same time.
It is worse. The use case of many threads, resource contention, diminishing and eventually negative returns does exist and I've run into it, but it's not common at all for regular users and not even that interesting to me. I want to know how the CPU responds to full util (not being able to do full turbo like you said).
There are examples of programs that aren't totally parallel or serial, they'll scale to maybe 6 cores on a 32-core machine. But there's so much variation in that, idk how you'd pick the right amount of sharing, so the only reasonable thing to test is something embarassingly parallel or close. Geekbench 6's scaling curve is way too flat.