Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Working in the risk analytics space, "Big Data" seems to be marketing speak for "you can dump all sorts of loosely structured data into this big bucket and our tools will help you find meaningful trends in it." I've yet to see an installation approaching anything near 300GB, so I think of big data as the new sexier label to put on ad hoc data mining applications.


But if it is for running ad hoc data mining on relatively small amounts of data, wouldn't a traditional SQL database do the job just as well? Especially with with features like the JSON support in PostgreSQL.


Yes, you are perfectly correct. But a lot of people are paid a lot of money for reinventing the wheel every few years, so expect your words to fall on deaf ears.


I think the question is how much effort will it take to get all kinds of data from all kinds of sources into a nice, coherent schema.

The "schema-less" aspects of NoSQL data stores is a big part of the marketing appeal, I think.

(Maybe the JSON support in PostgreSQL makes this moot, but that's the thinking.)


You always have a schema. The only question is if you know what it is, or not.

In the next few years there will be a lot of money to be made getting data out of these stores and back into relational databases.


PostgreSQL has terrible sharding and clustering support.

So no I would not be using it for any big data projects.


I was going by mcphilip's description which was <300GB. For running data mining on so small datasets you do not need any sharding.


Such general statements can only be inaccurate.

We're having about 3TB of PostgreSQL data on 4 servers. It works very well for us.


> "Big Data" seems to be marketing speak for "you can dump all sorts of loosely structured data into this big bucket and our tools will help you find meaningful trends in it."

That's the description of a data warehouse.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: