Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a really interesting open problem to get the cost of these down so that they can be used to heuristically select the variable order for worst case optimal joins during evaluation.

It's somewhere on the back of my todo list, and I have the hunch that it would enable instance optimal join algorithms.

I've dubbed these the Atreides Family of Joins:

  - Jessicas Join: The cost of each variable is based on the smallest number of rows that might be proposed for that variable by each joined relation.
  - Pauls join: The cost of each variable is based on the smallest number of distinct values that will actually be proposed for that variable from each joined relation.
  - Letos join: The cost of each variable is based on the actual size of the intersection.
In a sense each of the variants can look further into the future.

I'm using the first and the second in a triplestore I build in Rust [1] and it's a lot faster than Oxigraph. But I suspect that the constant factors would make the third infeasable (yet).

1: https://github.com/triblespace/tribles-rust/blob/master/src/...



Having read something vaguely related recently [0] I believe "Lookahead Information Passing" is the common term for this general idea. That paper discusses the use of bloom filters (not HLL) in the context of typical binary join trees.

> Letos join

God-Emperor Join has a nice ring to it.

[0] "Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis" - https://www.vldb.org/pvldb/vol16/p2962-zhang.pdf


Thanks for the interesting paper!

  We now formally define our _God-Emperor Join_ henceforth denoted join_ge...
Nice work with TXDB btw, it's funny how much impact Clojure, Datomic and Datascript had outside their own ecosystem!

Let me return the favour with an interesting paper [1] that should be especially relevant to the columnar data layout of TXDB. I'm currently building a succinct on-disk format with it [2], but you might be able to simply add some auxiliary structures to your arrow columns instead.

1: https://aidanhogan.com/docs/ring-graph-wco.pdf

2: https://github.com/triblespace/tribles-rust/blob/archive/src...


> Nice work with TXDB btw

It's X.T. (as in 'Cross-Time' / https://xtdb.com), but thank you! :)

> 1: https://aidanhogan.com/docs/ring-graph-wco.pdf

Oh nice, I recall skimming this team's precursor paper "Worst-Case Optimal Graph Joins in Almost No Space" (2021) - seems like they've done a lot more work since though, so definitely looking forward to reading it:

> The conference version presented the ring in terms of the Burrows–Wheeler transform. We present a new formulation of the ring in terms of stable sorting on column databases, which we hope will be more accessible to a broader audience not familiar with text indexing


My apologies! It's even more emberrasing given the fact that I looked up the website, knowing that I _always_ swap them after having written too many `tx-listen` in my career.

They expanded their work to wider relations and made the whole framework a lot more penetrable. I think they over-complicate things a bit with their forward-extension, so I'm keeping every column twice (still much better than all permutations), which in turn allows for ad-hoc cardinality estimation.

Also the 1 based indexing with inclusive ranges is really not doing them any favours. Most formula become much more streamlined and simpler with 0 based indexing and exclusive ranges. (see `base_range` and `restrict_range` in [0])

0: https://github.com/triblespace/tribles-rust/blob/e3ad6f21cdc...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: