Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Sharding kills most of the value of a relational database."

I can appreciate the point the author makes about partitioning schemes requiring heavy integration with the business logic, but I disagree with the claim that sharding doesn't work.

It's probably more accurate to say that sharding only works well if you design your database very carefully, or just get lucky about how your data model maps to sharding schemes. A bad sharded database can probably hobble your app, but a good one does get you remarkably close to true horizontal scaling.

In my experience, at least.



Sharding means you can't do "select * from a join b using (k)" unless you can prove the a record and the b record will always be hosted on the same shard. So of course you often have to write application code that knows where they are and goes and gets them separately. But that's precisely the problem that relational databases were developed to solve for you.

Then there's the whole distributed transaction thing, which is so painful that some people just live with wrong answers instead.


If you data model maps well to sharding, it maps well to distributed databases, and you're not gaining much by being on an RDBMS. I think the point they're trying to make is if you're going through the contortions to shard a relational database, you're better off just doing it in a distributed database in the first place.


There's more than one way to skin a mongoose and in this case the more common shard solutions are fundamentally different from distributed DB in that a distributed DB often times contains large amounts of a a subset of data that has to be merged with data on other DBs whereas a shard can have entire schema duplication on different shards where you can use relational technology just fine and even if the query is distributed you can just add rows to the end of a dataset rather than joining the data columns manually (implementing your own relational joins). In many cases you partition your sharded data in such a way that the majority of your queries are not distributed so that they can continue to leverage the relational model you had before, only performing a distributed query for more complex actions like reporting and such.

Also going back to the first place isn't really an option, shards usually come of a system that has grown, not ground up design. Ultimately I still support the distributed model myself, but a shard model does support relational data much more so than a distributed DB (at least out of the box).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: