Threads for python webapps are fine, but they break down when doing things like chat (using long polling or websockets) where you have a lot of open mostly idle connections. Coroutines and libraries like gevent and eventlet let you elegantly use greenlets without contorting your code -- ie, you get async without callbacks or deferreds -- and you have the benefit of being able to handle thousands of connections in a single python process. Of course you don't get processor concurrency, but you don't really get that with straight CPython because of the GIL anyway -- and if you're deploying a largish web app, you're going to have a load balancer (haproxy ftw) and you can just run one process per core anyway. This also mostly transparently lets you scale out to N boxes with M cores per box with very little architecture changes at the inbound request to web-server level.
Which brings me to the microprocess / shared module ideas -- RAM is pretty cheap -- 1GB or 2GB per core isn't anything fancy, and if you're using more than a couple gig in a python webapp process, something is probably wrong. As far as sharing state between microprocesses, I don't see how that really solves anything -- you're going to have processes on different boxes if you have any sort of traffic or fault tolerance and you're going to be putting that shared state into a cache or database somewhere anyway.
Raw compute speed is important to the scipy/numpy crowds, so I think things like Cython and PyPy make a lot of sense there -- webapps are I/O bound, so you spend 90% or more of the time it takes to service a request waiting, which async is really good at. PyPy isn't going to beat CPython by 20x on some django benchmark, but they will on some compute-intensive ones.
So anyway, I think my point is, you can have your cake and eat it too -- coroutines without pain (gevent) and processor-level concurrency (load balancer) for web applications using off the shelf production-ready technology on commodity hardware...
Which brings me to the microprocess / shared module ideas -- RAM is pretty cheap -- 1GB or 2GB per core isn't anything fancy, and if you're using more than a couple gig in a python webapp process, something is probably wrong. As far as sharing state between microprocesses, I don't see how that really solves anything -- you're going to have processes on different boxes if you have any sort of traffic or fault tolerance and you're going to be putting that shared state into a cache or database somewhere anyway.
Raw compute speed is important to the scipy/numpy crowds, so I think things like Cython and PyPy make a lot of sense there -- webapps are I/O bound, so you spend 90% or more of the time it takes to service a request waiting, which async is really good at. PyPy isn't going to beat CPython by 20x on some django benchmark, but they will on some compute-intensive ones.
So anyway, I think my point is, you can have your cake and eat it too -- coroutines without pain (gevent) and processor-level concurrency (load balancer) for web applications using off the shelf production-ready technology on commodity hardware...