Really hard disagree. Coming from hadoop, databricks is utopia. It's stable, fas...

willvarfar · 2025-05-14T11:54:29 1747223669

Spark was a really big step up from hadoop.

But these days just use trino or whatever. There are lots of new ways to work on data that are all bigger steps up - ergonomically, performance and price - over spark as spark was over hadoop.

disgruntledphd2 · 2025-05-14T13:14:13 1747228453

The nice thing about spark is the scala/python/R APIs. That helps to avoid lots of the irritating things about SQL (the same transformation applied to multiple columns is a big one).

lelandbatey · 2025-05-15T05:08:58 1747285738

I really can't speak highly enough of Trino (though I used it as AWS Athena, and this was back when Trino was called Presto). It's impressive how well it took "ever growing pile of CSV/JSON/Excel/Parquet/whatever" and let you query it via SQL as-is without transforming it and putting it into some other system.

What an impressive feat of engineering.

DebtDeflation · 2025-05-14T14:13:07 1747231987

Hadoop was fundamentally a batch processing system for large data files that was never intended for the sort of online reporting and analytics workloads for which the DW concept addressed. No amount of Pig and Hive and HBase and subsequent tools layered on top of it could ever change that basic fact.

winwang · 2025-05-14T19:46:56 1747252016

If cost (or perf) is the issue, we're building a super-efficient, GPU-accelerated, easy-to-use Spark: https://news.ycombinator.com/item?id=43964505