_willmanning's comments

_willmanning · 2025-08-06T16:46:08 1754498768

Vs Apache Parquet: 100x faster random access, 10-20x faster scans, 5x faster writes, similar compression ratio

And nearly as fast as duckdb's native format on Clickbench when queried from duckdb

_willmanning · 2025-07-24T20:42:38 1753389758

I'll chime in (as a Vortex maintainer), that we are greatly indebted to Azim's work on FastLanes & ALP. Vortex heavily utilizes his work to get state-of-the-art performance.

I would add that Vortex doesn't have standalone Clickbench results. Azim is presumably referring to the duckdb-vortex results, which were run on an older version of duckdb (1.2) than the duckdb-parquet ones (1.3). We'll get those updated shortly; we just released a new version of Vortex & the duckdb extension. Meanwhile, I believe the DataFusion-Vortex vs DataFusion-Parquet speedups show substantial improvements across the board.

The folks over at TUM (who originally authored BtrBlocks) did a reasonable amount of micro-benchmarking of Vortex vs Parquet in their recent "Anyblox" paper for VLDB 2025: https://gienieczko.com/anyblox-paper

They essentially say in the paper that Vortex is much faster than the original BtrBlocks because it uses better encodings (specifically citing FastLanes & ALP).

I'm looking forward to seeing the FastLanes Clickbench results when they're ready, and Azim, we should work together to benchmark FastLanes against Vortex!

azimafroozeh · 2025-07-24T21:14:46 1753391686

As we discuss in the FastLanes paper, the way BtrBlocks implements cascaded encodings (Vortex now) is essentially a return to block-based compression such as Zstd — which we're trying to avoid as much as possible. This design doesn't work well with modern vectorized execution engines or GPUs: the decompression granularity is too large to fit in CPU caches or GPU shared memory. So Vortex ends up being yet another Parquet-like file format, repeating the same mistakes. And if it still underperforms compared to Parquet... what’s the point?

We just released FastLanes v0.1, and more results — including ClickBench — are coming soon. Please do benchmark FastLanes — and keep us posted!

_willmanning · on Oct 14, 2024

Compression! Vortex can easily be 10x smaller than the equivalent Arrow representation (and decompresses very quickly into Arrow)

cle · on Oct 14, 2024

Nice!

_willmanning · on Oct 14, 2024

Perhaps that verbiage is just confusing. "On-disk" sort of implies "file format" but could be more explicit.

That said, the immediate next line in the README perhaps clarifies a bit?

"Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously, what LLVM + Clang are to compilers): a highly extensible & extremely fast framework for building a modern columnar file format, with a state-of-the-art, "batteries included" reference implementation."

jagged-chisel · on Oct 14, 2024

“Vortex is […] a highly extensible & extremely fast framework for building a modern columnar file format.”

It’s a framework for building file formats. This does not indicate that Vortex is, itself, a file format.

aduffy · on Oct 14, 2024

Will and I actually work on Vortex :wave:

Perhaps we should clean up the wording in the intro, but yes there is in fact a file format!

We actually built the toolkit first, before building the file format. The interesting thing here is that we have a consistent in-memory and on-disk representation of compressed, typed arrays.

This is nice for a couple of reasons:

(a) It makes it really easy to test out new compression algorithms and compute functions. We just implement a new codec and it's automatically available for the file format.

(b) We spend a lot of energy on efficient push down. Many compute functions such as slicing and cloning are zero-cost, and all compute operations can execute directly over compressed data.

Highly encourage you to checkout the vortex-serde crate in the repo for file format things, and the vortex-datafusion crate for some examples of integrating the format into a query engine!