We ended up using ClickHouse after trying Timescale and InfluxDB. ClickHouse is ...

physicles · 2025-02-18T14:01:49 1739887309

We’ve been using InfluxDB 1.x since 2017 and I’m itching to get off of it. Currently we’re at about a trillion rows, and we have to ship our data to Snowflake to do large-scale aggregations (like hourly averages). I put a bunch of effort into building this on top of InfluxDB and it’s not up to the task.

Any sense if ClickHouse can scale to several TB of data, and serve giant queries on the last hour’s data across all sensors without getting OOMKilled? We’re also looking at some hacked together DuckDB abomination, or pg_duck.

osigurdson · 2025-02-18T21:02:38 1739912558

A trillion rows should be no issue at all for ClickHouse. Your use case sounds a bit more typical for it (our data is simulated so there is no single / monotonically increasing clock). I don't know though, is this ~100KS/s data acquisition type stuff (i.e. sound or vibration data for instance)? If so, it wouldn't be possible to push that into ClickHouse without pre-processing.

sureglymop · 2025-02-17T16:56:44 1739811404

Interesting.. But clickhouse is an entirely different database engine, no? I will look into it, thank you!

osigurdson · 2025-02-18T14:08:11 1739887691

Yes, it is a columnar DB. A lot of things feel pretty familiar as it has a SQL like query language. The data model is different though.

The nice thing is, once you understand the data model it becomes very easy to predict if it will fit your use case or not as there is really no magic to it.