Trying to write optimized Huffman decompression code for the PlayStation 1 (which by total coincidence happens to be MIPS R3000-based) taught me this. C can go very far if you know how to optimize it, but at some point you are going to run into roadblocks anyway since the compiler will not generate the exact code you want it to generate. It will generate code that _looks_ compact and fast, but can't hold a candle to properly hand-rolled assembly that takes pipelining and asynchronous behavior into account.
And of course this gets worse with CPUs that were not designed to be a target for compiled code in the first place, i.e. the vast majority of 8-bit architectures which for better or worse still dominate the low-end microcontroller market.
And of course this gets worse with CPUs that were not designed to be a target for compiled code in the first place, i.e. the vast majority of 8-bit architectures which for better or worse still dominate the low-end microcontroller market.