It's a little apples-to-oranges, but it's not "wrong". Lots of IPC problems really are solved fastest using a socket or pipe. Writing to a shared memory buffer doesn't clue the kernel in to the fact that there is new data available, which means you need a syscall anyway to transfer control. And it makes the application responsible for all the buffering that occurs, and applications don't have a "big picture" available to properly optimize things.
Don't diss the pipe. It doesn't do everything, but if you're going to pick just one IPC mechanism, it's the one.
You don't make any sense when it comes to "buffers" (writing to shared memory IS writing to a buffer); a pipe doesn't give you a "big picture" to "properly optimize things" (a WTF "argument"), and no syscall is needed to "transfer control", unless you're talking about futexes or some other method to clue the kernel in, in which case, you do indeed have a way to clue the kernel in.
Writing once is faster than writing multiple times. That's why shared memory is obviously faster than not sharing and copying. Time needed to set up shared memory segments is negligible if done right -- once, at the beginning, and then re-used as needed.
I think that the argument there is that shared memory is not useful to transfer data from one process to another, but to share data between processes. Which is quite different use case.
Shared memory is fast, but it is just a shared memory. You have to synchronize accesses to it in some way.
When you want to transfer relatively small records from one process to another, it is probably faster to send them through pipe, because the copying overhead is negligible compared to synchronization overhead, which would be there in shared memory case too (and will probably be larger).
So to wrap up:
1) In most cases it is really needed to tell kernel that you are done and want to wait on some event in other process (relying on preemption or explicit timed sleeps to switch to other process is considered bad practice)
2) Writing once is indeed faster than multiple times, which is in this case of many small writes argument for pipes. Because pipe lets you hand over some data to kernel and let it do any low-level synchronization details, which it can generally do better.
3) In UNIX, pipes/sockets are the most flexible means of IPC. And in case where you have server that reads large chunks of data from multiple clients using shared memory, synchronizing accesses to this shared memory by sockets/pipe is good idea, because you can use select(2)/poll(2) on such socket. Using mutexes or semaphores or something like that would in most cases lead to busy-waiting or unnecessary convoluted code.
> You have to synchronize accesses to it in some way.
Pretty simple.
> it is probably faster to send them through pipe, because the copying overhead is negligible compared to synchronization overhead, which would be there in shared memory case too (and will probably be larger).
There is still synchronization overhead -- just because it's hidden from you in the kernel doesn't mean it isn't there. You should check out the source of your favorite kernel and see how much work is done behind the scenes to transfer data over a socket.
So you have an extra copy, PLUS extra synchronization overhead. The only reason people ever think that method is faster is because of operator error, e.g. creating and destroying shared memory segments thinking it's like malloc.
> 1) In most cases it is really needed to tell kernel that you are done
man futex
Or just busy-wait, or do a nonblocking "read" like you would anyway when reading from a socket.
> kernel and let it do any low-level synchronization details,
Look at the code. Really. It's a lot of overhead compared to userspace solutions.
> In UNIX, pipes/sockets are the most flexible means of IPC
s/flexible/common
Obviously shared memory is the most flexible because you can implement whatever scheme you want with it. There's always shared memory somewhere anyway, even with sockets, it's just hidden from you in the case of kernel code.
> Using mutexes or semaphores or something like that would in most cases lead to busy-waiting or unnecessary convoluted code.
Busy-waiting is done in multiple places in the kernel. If you're blocking on read anyway, what's the difference -- and if you're not, it's no more convoluted to check a shared condition variable on each pass than nonblocking reads and accumulations.
Not sure where you got this but it's wrong.