It would be neat to fire this up on an older processor which doesn’t have modern instruction-level parallelism and verify the difference in performance
On x86 you'd have to search pretty far back before the available ILP really dropped off. Some of the lower-end OoO ARMs might be a good testing ground, though. Say, a Raspberry Pi 4? Earlier-gen RPi used in-order cores.