This post talksabout literals still being encoded with huffman tables and stuff....

pcwalton · on May 18, 2022

The first 4 bytes are the magic number and the last 4 bytes are the checksum [1] which you could always just chop off if you wanted (it's legal to omit the checksum, see the spec). That would get the total overhead down to 5 bytes.

[1]: https://github.com/facebook/zstd/blob/dev/doc/zstd_compressi...

wolf550e · on May 18, 2022

This is an uncompressed block.

For database records or log lines, you should use dictionary trained on likely data. Then short records compress well.

See what facebook does with zstd (they employ its author): https://engineering.fb.com/2018/12/19/core-data/zstandard/

terrelln · on May 18, 2022

Zstd has a 4-byte magic number, which is used to check if the data is zstd encoded. In addition to that, this example has a 2 byte frame header (including the decompressed size), a 3 byte block header, and a 4 byte checksum at the end (which can be disabled with `--no-check`).

Zstd does have a mode that passes-through incompressible data, both for incompressible literals, and completely incompressible blocks (128 KB chunks).

For small inputs, we recommend using dictionary compression. But even with dictionary compression, because of our header costs, you won't generally see benefits until your data is at least ~50 bytes. But YMMV depending on your data.