Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This post talksabout literals still being encoded with huffman tables and stuff.

However, it appears this must be optional, because short strings compressed with zstd pass through 'unencoded', apart from a header and footer added:

    $ echo "hello world" | zstd  | hexdump -C
    00000000  28 b5 2f fd 04 58 61 00  00 68 65 6c 6c 6f 20 77  |(./..Xa..hello w|
    00000010  6f 72 6c 64 0a 8c 6d 7d  20                       |orld..m} |

As an aside, I'm fairly disappointed that zstd didn't use a single byte header for uncompressible strings, so that they could guarantee that the compressed data will never be more than 1 byte larger than the source data. That property is very useful where lots of tiny strings are being compressed, such as in a database.


The first 4 bytes are the magic number and the last 4 bytes are the checksum [1] which you could always just chop off if you wanted (it's legal to omit the checksum, see the spec). That would get the total overhead down to 5 bytes.

[1]: https://github.com/facebook/zstd/blob/dev/doc/zstd_compressi...


This is an uncompressed block.

For database records or log lines, you should use dictionary trained on likely data. Then short records compress well.

See what facebook does with zstd (they employ its author): https://engineering.fb.com/2018/12/19/core-data/zstandard/


Zstd has a 4-byte magic number, which is used to check if the data is zstd encoded. In addition to that, this example has a 2 byte frame header (including the decompressed size), a 3 byte block header, and a 4 byte checksum at the end (which can be disabled with `--no-check`).

Zstd does have a mode that passes-through incompressible data, both for incompressible literals, and completely incompressible blocks (128 KB chunks).

For small inputs, we recommend using dictionary compression. But even with dictionary compression, because of our header costs, you won't generally see benefits until your data is at least ~50 bytes. But YMMV depending on your data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: