Looks interesting, we solved this problem with Kinesis Firehose, S3 and Athena. ...

bosky101 · on Oct 29, 2024

Storing small events in s3 can explode costs quickly.

At 1M events/day that's $7.5/day. Decent

At 15M, $75/day

Cost for 150 million S3 PUT requests per day of 25KB each would be $750/day, assuming no extra data transfer charges.

With clickhouse you won't get charged per read/write

hitradostava · on Oct 29, 2024

Kinesis supports buffering - up to 900 seconds or 128mb. So you are way out on your cost estimations. Over time queries can start costing more due to S3 Requests, but regular spark runs to combine small files solves that.

bosky101 · on Oct 31, 2024

I haven't even got to kinesis or bandwidth or storage.

Even if you compress N objects through spark/etc your starting point would be the large number of writes first. So that doesn't change. The costs would be even larger considering even more medium sized PUT's that double the storage, add N deletes potentially. Have also heard that Athena, presto etc charge based on rows read.