I don't, really. I do system software for high performance and distributed compu...

I don't, really. I do system software for high performance and distributed computing. The scientific applications are given to me, and are largely benchmarks. (Although, now, I don't do much actual HPC, but high-throughput, low-latency stream processing.)

I have a friend who has a galaxy simulation that uses OpenMP but not MPI. The reason is simple: she doesn't have the expertise to make it distributed. Slapping a few OpenMP directives on the most expensive loops is easier than figuring out how to make it distributed using message passing. How much parallelism you extract out of a program is often a function of how much time and effort you can invest into it. Some people get "good enough" performance improvements by scaling on a single node.