It's a really interesting open problem to get the cost of these down so that the...

refset · on May 17, 2024

Having read something vaguely related recently [0] I believe "Lookahead Information Passing" is the common term for this general idea. That paper discusses the use of bloom filters (not HLL) in the context of typical binary join trees.

> Letos join

God-Emperor Join has a nice ring to it.

[0] "Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis" - https://www.vldb.org/pvldb/vol16/p2962-zhang.pdf

j-pb · on May 17, 2024

Thanks for the interesting paper!

  We now formally define our _God-Emperor Join_ henceforth denoted join_ge...

Nice work with TXDB btw, it's funny how much impact Clojure, Datomic and Datascript had outside their own ecosystem!

Let me return the favour with an interesting paper [1] that should be especially relevant to the columnar data layout of TXDB. I'm currently building a succinct on-disk format with it [2], but you might be able to simply add some auxiliary structures to your arrow columns instead.

1: https://aidanhogan.com/docs/ring-graph-wco.pdf

2: https://github.com/triblespace/tribles-rust/blob/archive/src...

refset · on May 17, 2024

> Nice work with TXDB btw

It's X.T. (as in 'Cross-Time' / https://xtdb.com), but thank you! :)

> 1: https://aidanhogan.com/docs/ring-graph-wco.pdf

Oh nice, I recall skimming this team's precursor paper "Worst-Case Optimal Graph Joins in Almost No Space" (2021) - seems like they've done a lot more work since though, so definitely looking forward to reading it:

> The conference version presented the ring in terms of the Burrows–Wheeler transform. We present a new formulation of the ring in terms of stable sorting on column databases, which we hope will be more accessible to a broader audience not familiar with text indexing

j-pb · on May 17, 2024

My apologies! It's even more emberrasing given the fact that I looked up the website, knowing that I _always_ swap them after having written too many `tx-listen` in my career.

They expanded their work to wider relations and made the whole framework a lot more penetrable. I think they over-complicate things a bit with their forward-extension, so I'm keeping every column twice (still much better than all permutations), which in turn allows for ad-hoc cardinality estimation.

Also the 1 based indexing with inclusive ranges is really not doing them any favours. Most formula become much more streamlined and simpler with 0 based indexing and exclusive ranges. (see `base_range` and `restrict_range` in [0])

0: https://github.com/triblespace/tribles-rust/blob/e3ad6f21cdc...