Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We ended up using ClickHouse after trying Timescale and InfluxDB. ClickHouse is great but important to spend a day or two understanding the data model to make sure it fits what you are trying to do. I have no affiliation with ClickHouse (or any company mentioned).


We’ve been using InfluxDB 1.x since 2017 and I’m itching to get off of it. Currently we’re at about a trillion rows, and we have to ship our data to Snowflake to do large-scale aggregations (like hourly averages). I put a bunch of effort into building this on top of InfluxDB and it’s not up to the task.

Any sense if ClickHouse can scale to several TB of data, and serve giant queries on the last hour’s data across all sensors without getting OOMKilled? We’re also looking at some hacked together DuckDB abomination, or pg_duck.


A trillion rows should be no issue at all for ClickHouse. Your use case sounds a bit more typical for it (our data is simulated so there is no single / monotonically increasing clock). I don't know though, is this ~100KS/s data acquisition type stuff (i.e. sound or vibration data for instance)? If so, it wouldn't be possible to push that into ClickHouse without pre-processing.


Interesting.. But clickhouse is an entirely different database engine, no? I will look into it, thank you!


Yes, it is a columnar DB. A lot of things feel pretty familiar as it has a SQL like query language. The data model is different though.

The nice thing is, once you understand the data model it becomes very easy to predict if it will fit your use case or not as there is really no magic to it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: