Some time ago, I had blogged a series of posts about various ways of doing pipe-like operations in Python, including one experimental project of mine (which is not like real Unix pipes, which are IPC, but is intra-process), and some others, including PyP. Below are links to some of those posts, which may be of interest. Note: some of the posts link to each other, so I've posted them below in reverse chronological order.
Swapping pipe components at runtime with pipe_controller:
Mostly out of frustration for PyP not being lazy (on large inputs it reads the entire file up-front, or at least used to).
But it was quite interesting to implement all the standard python idioms (like slicing) in a lazy way, and it's not that complicated a tool. I still use it a lot whenever I have a nontrivial pipeline to write.
Yea, fortunately pythonpy supports lazy iteration over sys.stdin when you really need it. Just like in python, the syntax won't be as nice as using a list. But it works:
However, the number of times that you need this are surprisingly rare. Most lazy operations don't require that each row be aware of the surrounding row context, and using the much simpler:
py -x 'new_row_from_old_row(x)'
will get the job done in a lazy fashion. Usually, when you need rows to be context aware, as in:
py -l 'sorted(l)'
or
py -l 'set(l)'
it's just not possible to accomplish your task without reading in all of stdin.
Cool :), glad it's supported, at least for the simple case of line-wise transforms.
Some things can't be done without reading everything. But there are still a number of operations on "all of stdin" that can safely be done lazily. I'm particularly fond of "divide stdin into chunks of lines separated by <predicate>" [0]. Which does need context, but only enough to determine where the current chunk ends (typically a few lines).
`py` seems to be aimed at a single expression per invocation (nice and simple), while `piep` recreates pipelines internally (more complex but also means pipelines can produce arbitrary objects rather than single-line strings). So I'm not really sure how you'd do the above in `py` anyway.
http://opensource.imageworks.com/?p=pyp
The Pyed Piper Tutorial: http://www.youtube.com/watch?v=eWtVWF0JSJA