Trying to write optimized Huffman decompression code for the PlayStation 1 (which by total coincidence happens to be MIPS R3000-based) taught me this. C can go very far if you know how to optimize it, but at some point you are going to run into roadblocks anyway since the compiler will not generate the exact code you want it to generate. It will generate code that _looks_ compact and fast, but can't hold a candle to properly hand-rolled assembly that takes pipelining and asynchronous behavior into account.
And of course this gets worse with CPUs that were not designed to be a target for compiled code in the first place, i.e. the vast majority of 8-bit architectures which for better or worse still dominate the low-end microcontroller market.
Contrary to modern knowledge, C compilers were quite crappy back in those days, it was the increasing tricks used by C optmizisers, specially regarding the nowadays so beloved UB, that really improved it.
There is a reason why Michael Abrash books are all about Assembly.
Never expected to hear this one.