Not horribly relevant from a software perspective, but as a hardware geek I thin...

Not horribly relevant from a software perspective, but as a hardware geek I think the way they're doing threading is really interesting. Big OoO processors like a normal Xeon or a Power7 usually use simultaneous multithreading (SMT) which means that you have instructions from two threads being fed to the execution units every clock cycle, and since they often aren't in contention for the same resources you get higher throughput. Some in-order processors like a Niagra often use block multithreading (BMT) where you run one process until you get a cache miss, then switch to another thread with some delay as the pipeline is flushed.

What the Phi is doing is combining those approaches, running two threads simultaneously and switching threads out on cache-misses. This way you only double rather than quadrupaling your control structures, but you don't have your cores entirely unutalized when you're swapping threads. A really nifty compromise, I think.