Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Whoever wrote this hasn't worked on medium-complicated data pipelines / ETL logic.

It's pretty non-trivial to try to make an effective-dated slowly changing dimension with materialized views.

A good tool makes the medium-difficulty stuff easy, and the complicated stuff possible. Materialized views do only the former.

I would love to be wrong about this.



Author here.

Are you thinking of a specific implementation of materialized views? Most implementations from traditional RDBMSs would indeed be too limiting to use as a general data pipeline building block.

The post doesn't argue that, though. It's more about using materialized views as a conceptual model for understanding data pipelines, and hinting at some recent developments that may finally make them more suitable for more widespread use.

From the conclusion:

> The ideas presented in this post are not new. But materialized views never saw widespread adoption as a primary tool for building data pipelines, likely due to their limitations and ties to relational database technologies. Perhaps with this new wave of tools like dbt and Materialize we’ll see materialized views used more heavily as a primary building block in the typical data pipeline.


You can absolutely get complicated data models nailed using views. Some of the views get unwieldly and a bit long but it can be done. The catch is in incrementally loading the aggregate tables or self referencing (even then the underlying views are essentially functions to be included). I scanned the article and have followed materialize.io for a bit and built pipelines that handle what is essentially the awfulness of performantly updating materialized views.

I'm not a master but believe a core piece of truth for data:

Move the data as little as possible

If you can use federated query (cost/performance is acceptable), do so. If you can use materialized views, do so. Data replication has tons of issues, you almost always have the 2 generals problem and reconciliation / recompile procedures. If you don't move the data, the original storage is the source and it is always right.

I think I went way out of responding directly to you, I do think materialize.io and delta lake and declarative pipelines are the solution to 95% of the data problems out there.

I'm speaking conceptually about materialized views as system implementations differ.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: