TheXenocide's comments

TheXenocide · on July 7, 2009

There's more than one way to skin a mongoose and in this case the more common shard solutions are fundamentally different from distributed DB in that a distributed DB often times contains large amounts of a a subset of data that has to be merged with data on other DBs whereas a shard can have entire schema duplication on different shards where you can use relational technology just fine and even if the query is distributed you can just add rows to the end of a dataset rather than joining the data columns manually (implementing your own relational joins). In many cases you partition your sharded data in such a way that the majority of your queries are not distributed so that they can continue to leverage the relational model you had before, only performing a distributed query for more complex actions like reporting and such.

Also going back to the first place isn't really an option, shards usually come of a system that has grown, not ground up design. Ultimately I still support the distributed model myself, but a shard model does support relational data much more so than a distributed DB (at least out of the box).

TheXenocide · on July 7, 2009

I beg to differ; the company I'm at now had SAN representatives saying that they've never seen a non-SAN DB push as many IOPS as ours was before we started exploring scalability solutions and I'm still rather compelled by the potential of "no SQL" solutions, thanks. Oh, and no, we're not using MySQL or any other open source DB for that matter, so maybe it's you that has the prejudice here...

TheXenocide · on July 7, 2009

Maybe you should ask Google... they seem to be doing pretty well with commodity hardware and something's telling me that they've got better reliability, higher efficiency and lower costs. Also, unused features merely add unnecessary complexity to a system. Thanks for the info on the cheaper SAN though, I'll have to look into it.

TheXenocide · on July 7, 2009

Most of the things he mentions apply to most DB engines, MySQL, MS-SQL, Postgres, Firebird and many editions of Oracle (all of which I've used in data-intensive applications) suffer the same woes. Ultimately it would appear that you went into it reading the article with a narrow minded view of "Yes they do!" instead of "Why don't they?" Also, the disk analogy was for relating information not discussing storage technology or to imply that was what he was using when trying to scale. The portion of vertical scalability clearly presents the option of (and pitfalls therein) using hardware to account for data scalability...

TheXenocide · on July 7, 2009

http://incubator.apache.org/cassandra/

TheXenocide · on July 7, 2009

For future reference, cold hard facts are much more useful than posting your resume. I'm not trying to say you're unqualified or unintelligent in any way, just that rationalizing an opinion with a job history is far less compelling evidence than factual examples. It's also interesting that you would immediately associate a well-made and supported argument (for those of us who have any experience scaling with technologies other than Oracle) with someone who isn't a database expert and doesn't know what he's doing.

Now let's get to the facts: while perhaps Oracle might have nice features, that doesn't mean that SQL is the best we can come up with, which is one of the primary points of the article (and one that seems completely unaddressed here). In a sense he brings to attention a common occurrence throughout human history wherein we reject change for comfort and these comments are doing nothing more than supporting that. Oracle is, for the most part, entirely too expensive for most businesses and even the businesses that are large enough to adopt it aren't making the profit they could with more affordable technology. You should also make note that, while banks do have strict requirements on data handling, they are really responsible for serving very small user bases when compared to things like Google, Amazon, Ebay, Facebook and the like. Sure the requirements are different for each of these, but ultimately your argument hold no water against these infrastructures, which are inherently not SQL and, at the same time, seem to be much more accepted as innovators of scalable technologies for the future...

One big problem is that real innovation comes at the expense of backwards compatibility, which would involve making a lot of changes. I can relate, since massive changes in most case imply bugs, which is a very unsettling position for some of these, but it doesn't mean there isn't a problem. Sure banks and other largely established companies would rather shell out the cash to support their legacy ideals than innovate new solutions, but he's right in that we've been spending many many years of man-hours trying to tackle the problem of porting a dated philosophy to an age that requires more scalability at lower cost. Lets also say that banks haven't been proving their practices to be economically sounds lately, so what is their input worth in this matter? Other than large boatloads of cash for Oracle that is.

nettdata · on July 8, 2009

Just to clarify, I never said Oracle was the be-all end-all solution for everything.

And I didn't rationalize anything with a "job history", but rather with large systems that actually have been built and are working.

I'd be very interested to see just how many people in this discussion actually have actual experience building large, scalable systems?

The original article made an asinine, generic statement without any context, or mention of cost, and I said it was silly, and pointed to the obvious (and easiest) reason why it was silly, and that was Oracle.

All of a sudden a bunch of people started making statements and assumptions about the scenario the article was probably talking about, that weren't actually made in the article, to discount Oracle as an option. Cost, commodity hardware, etc., etc. Then people started to point out other large websites that actually HAVE scaled, without the use of Oracle, as if that disproves Oracle's abilities... and yet it totally disproves the statement of the original article.

If you want to get into some context-specific details of why certain specific SQL technologies don't scale well (or at all), at a certain price-point, then that's a whole other discussion that I'd be happy to enter into, and would probably agree with.

It's also interesting to note that most of the "DB technologies" that are being used to scale those sites aren't DB's at all, but rather various levels of data caching that are employed to reduce the load on the databases, and are only applicable to the general read-only and non-transactional nature of social sites.

The whole reason I brought up banking sites in the first place is that they are one of the few, more obvious scenarios where most of your end-user interactions are actually hitting the database in real-time, and all data must be current and consistent. There is no real option for caching to save your ass, except at the DB layer itself, via such mechanisms as Oracle's Cache Fusion technology.

Social sites generally don't have any of those real-time, consistent constraints, and are therefore much easier to scale larger, because the nature of the site and the data allows for so much more technology to be used in front of the database.

The plain and simple fact of the matter is that building a large, scalable system is hard work. It requires that you analyze and design ALL aspects of the entire system to scale, not just the database. (Network, caching servers, application, database, hardware, etc).

TheXenocide · on July 7, 2009

Oh, it may also be worth noting that engineers and administrators would have to change their way of thinking in all this too, which doesn't affect some too much, but may have a bit of an impact on somebody who's been sitting on the same technology platform for 20 years.