I think this marks the definitive point where RISC dominates CISC. Been a long time coming but M1 spells it out in bold. Sure variable size instructions are great when cache is limited and page faults are expensive. But with clock speeds and node sizes ansymptoting, the only way to scale is horizontal. More chiplets, more cores, bigger caches, bringing DRAM closer to cache. Minimizing losses from cache flushes by more threads, and less context-switching.
Basically computers that start looking and acting more like clusters. Smarter memory and caching, zero-copy/fast-copy-on-mutate IPC as first-class citizens. More system primitives like io_uring to facilitate i/o.
The popularity of modern languages that make concurrency easy means more leveraging of all those cores.
Agree with everything except the RISC vs CISC bit. Modern Arm isn't really RISC and x86 gets translated into RISC like microinstructions. To the extent that ARMv8 has an advantage I think it's due to being a nearly clean design 10 years ago rather than carrying the x86 legacy.
Basically computers that start looking and acting more like clusters. Smarter memory and caching, zero-copy/fast-copy-on-mutate IPC as first-class citizens. More system primitives like io_uring to facilitate i/o.
The popularity of modern languages that make concurrency easy means more leveraging of all those cores.