I'd say that the difference was the systems behavior in the presence of IO, and it's pretty important in my experience. Micro-batching systems hold up processing while waiting for IO but proper streaming implementations continue using cpu for elements at other points in the stream, very roughly.
Exactly. You can have the amortization benefits of batching without adding extra latency, if you do optimistic pipelining. It's super powerful if your transaction sizes aren't linear, e.g. because you're doing in-memory aggregations.