Interestingly, the problem isn't so much in building a scalable store on a singl...

Interestingly, the problem isn't so much in building a scalable store on a single machine that goes up to a certain point -- a billion edges, for example (about what we've tested our DB up to). Distributing the data becomes a really hard problem, however, because the issues of data locality is tricky. Typical map-reduce patterns are only of limited utility since each traversal may imply additional traversals, so you end up having to have "smart" (computing) nodes which may themselves trigger a second tier of map-reduces.

This is quite different from the problems of even large scale web apps where there's essentially a set of data that's pulled from a caching layer and assembled.