Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there some smaller version of this, suitable for embedding into programs?

  $ size /usr/lib/i386-linux-gnu/libzstd.so.1.3.3
     text    data     bss     dec     hex filename
   508674     492      20  509186   7c502 /usr/lib/i386-linux-gnu/libzstd.so.1.3.3
Yikes; half a meg of code!

Say I just want something in my C program for saving certain files using less space, and only care about decompressing only those files. Is there some small implementation of a subset of Zstandard: say no more than four or five source files, and 64K of machine code?

If I add 500K to the program, I'm going to have to save 500K by compressing the accompanying files, just to begin breaking even.

For comparison, libz may not have the decompression speed or the ratio, but it's more than four times smaller:

  $ size /lib/i386-linux-gnu/libz.so.1.2.11
     text    data     bss     dec     hex filename
   115537     588       4  116129   1c5a1 /lib/i386-linux-gnu/libz.so.1.2.11
This is more like it!

  $ size /lib/i386-linux-gnu/libbz2.so.1.0.4
     text    data     bss     dec     hex filename
    60511    3520       4   64035    fa23 /lib/i386-linux-gnu/libbz2.so.1.0.4

Text 60511. Is there some configuration of Zstandard which can cram into comparable code space?


https://github.com/facebook/zstd/tree/dev/doc/educational_de... is a self-contained zstd decoder. I get a 64 KB dynamically linked executable after running "make" in that directory.

  $ size harness
     text    data     bss     dec     hex filename
    24615     704      40   25359    630f harness


> Yikes; half a meg of code!

It's plausible that the lib you checked is the output from the project's default build target (zstd), which "(...) includes dictionary builder, benchmark, and supports decompression of legacy zstd formats"

https://github.com/facebook/zstd/tree/dev/programs

The project also provides another build target, zstd-small, which is "CLI optimized for minimal size; no dictionary builder, no benchmark, and no support for legacy zstd formats"

Also, take a look at what exactly is bundled with the binary. Odds are you're looking at a lib that statically links optional stuff that makes no sense shipping, let alone on an embedded target.


I looked at the "nm --dynamic"; I did see some dictionary API's:

  $ nm -D /usr/lib/i386-linux-gnu/libzstd.so.1.3.3 | grep -i -E 'init|create'
  [...]
  0000fa50 T ZSTD_createCDict
  0000d3a0 T ZSTD_createCDict_advanced
  0000fae0 T ZSTD_createCDict_byReference
  0000d6f0 T ZSTD_createCStream
  0000d6c0 T ZSTD_createCStream_advanced
  00042890 T ZSTD_createDCtx
  000427b0 T ZSTD_createDCtx_advanced
  00046a10 T ZSTD_createDDict
  00046960 T ZSTD_createDDict_advanced
  00046a40 T ZSTD_createDDict_byReference
  [...]


Yes! You can build a decompressor only version of zstd that is only 95 KB with my version of gcc.

    > make -j libzstd ZSTD_LIB_MINIFY=1 ZSTD_LIB_COMPRESSION=0 ZSTD_LIB_DICTBUILDER=0 ZSTD_LEGACY_SUPPORT=0 ZSTD_LIB_DEPRECATED=0
    > wc -c libzstd.so
    95824 libzstd.so
Longer term, we want to offer a stripped version of the library that includes the compression code, but only includes some of our compression levels. That way you can save code size for unused compression levels.

We've optimized pretty heavily in favor of speed over code size. But we want to offer better configurability, we just need to find the time to do it. We'd happily take PRs that go in this direction!


That's good to know.

Use "size" to check sizes; the raw file size could be including unknown amounts of extra material like debug symbols.


I'm not sure if it's significant, but the default zstd build contains legacy format support from 0.5 to 0.7 (which used different magic numbers). Setting `-DZSTD_LEGACY_SUPPORT=0` will completely disable legacy format support and might help you, especially given that you know what you deal is always not one of those legacy formats.


bzip2 comes out the clear winner in compressing TXR Lisp compiled files, and has the smallest code size (by far) in its shared library form:

  0:[0519:020528]:sun-go:~/txr/stdlib$ ls -l compiler.tlo*
  -rw-rw-r-- 1 kaz kaz 211119 May 12 11:06 compiler.tlo
  -rw-rw-r-- 1 kaz kaz  37615 May 19 02:01 compiler.tlo.bz2
  -rw-rw-r-- 1 kaz kaz  49200 May 19 02:01 compiler.tlo.gz
  -rw-rw-r-- 1 kaz kaz  38360 May 19 02:04 compiler.tlo.xz
  -rw-rw-r-- 1 kaz kaz  52387 May 19 02:02 compiler.tlo.zst
  0:[0519:020535]:sun-go:~/txr/stdlib$ size /lib/i386-linux-gnu/libbz2.so.1.0.4
     text    data     bss     dec     hex filename
    60511    3520       4   64035    fa23 /lib/i386-linux-gnu/libbz2.so.1.0.4
  0:[0519:020544]:sun-go:~/txr/stdlib$ size /lib/i386-linux-gnu/libz.so.1.2.11
     text    data     bss     dec     hex filename
   115537     588       4  116129   1c5a1 /lib/i386-linux-gnu/libz.so.1.2.11
  0:[0519:020653]:sun-go:~/txr/stdlib$ size /lib/i386-linux-gnu/liblzma.so.5.2.2
     text    data     bss     dec     hex filename
   166862     900       4  167766   28f56 /lib/i386-linux-gnu/liblzma.so.5.2.2
  0:[0519:020658]:sun-go:~/txr/stdlib$ size /usr/lib/i386-linux-gnu/libzstd.so.1.3.3 
     text    data     bss     dec     hex filename
   508674     492      20  509186   7c502 /usr/lib/i386-linux-gnu/libzstd.so.1.3.3
Code bloat: bz2 < z < xz < zstd

Compression: bz2 > xz > z > zstd

zstd with level -19 compression beats default gz to third place:

  -rw-rw-r-- 1 kaz kaz  41163 May 19 02:17 compiler.tlo.zst-19
gzip squeezes another 2K out with level -9, staying third:

  -rw-rw-r-- 1 kaz kaz  47206 May 19 02:19 compiler.tlo.gz-9
xz's output doesn't change at -9 level (relative to its default -6).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: