Is there some smaller version of this, suitable for embedding into programs?
$ size /usr/lib/i386-linux-gnu/libzstd.so.1.3.3
text data bss dec hex filename
508674 492 20 509186 7c502 /usr/lib/i386-linux-gnu/libzstd.so.1.3.3
Yikes; half a meg of code!
Say I just want something in my C program for saving certain files using less space, and only care about decompressing only those files. Is there some small implementation of a subset of Zstandard: say no more than four or five source files, and 64K of machine code?
If I add 500K to the program, I'm going to have to save 500K by compressing the accompanying files, just to begin breaking even.
For comparison, libz may not have the decompression speed or the ratio, but it's more than four times smaller:
$ size /lib/i386-linux-gnu/libz.so.1.2.11
text data bss dec hex filename
115537 588 4 116129 1c5a1 /lib/i386-linux-gnu/libz.so.1.2.11
This is more like it!
$ size /lib/i386-linux-gnu/libbz2.so.1.0.4
text data bss dec hex filename
60511 3520 4 64035 fa23 /lib/i386-linux-gnu/libbz2.so.1.0.4
Text 60511. Is there some configuration of Zstandard which can cram into comparable code space?
It's plausible that the lib you checked is the output from the project's default build target (zstd), which "(...) includes dictionary builder, benchmark, and supports decompression of legacy zstd formats"
The project also provides another build target, zstd-small, which is "CLI optimized for minimal size; no dictionary builder, no benchmark, and no support for legacy zstd formats"
Also, take a look at what exactly is bundled with the binary. Odds are you're looking at a lib that statically links optional stuff that makes no sense shipping, let alone on an embedded target.
I looked at the "nm --dynamic"; I did see some dictionary API's:
$ nm -D /usr/lib/i386-linux-gnu/libzstd.so.1.3.3 | grep -i -E 'init|create'
[...]
0000fa50 T ZSTD_createCDict
0000d3a0 T ZSTD_createCDict_advanced
0000fae0 T ZSTD_createCDict_byReference
0000d6f0 T ZSTD_createCStream
0000d6c0 T ZSTD_createCStream_advanced
00042890 T ZSTD_createDCtx
000427b0 T ZSTD_createDCtx_advanced
00046a10 T ZSTD_createDDict
00046960 T ZSTD_createDDict_advanced
00046a40 T ZSTD_createDDict_byReference
[...]
Longer term, we want to offer a stripped version of the library that includes the compression code, but only includes some of our compression levels. That way you can save code size for unused compression levels.
We've optimized pretty heavily in favor of speed over code size. But we want to offer better configurability, we just need to find the time to do it. We'd happily take PRs that go in this direction!
I'm not sure if it's significant, but the default zstd build contains legacy format support from 0.5 to 0.7 (which used different magic numbers). Setting `-DZSTD_LEGACY_SUPPORT=0` will completely disable legacy format support and might help you, especially given that you know what you deal is always not one of those legacy formats.
Say I just want something in my C program for saving certain files using less space, and only care about decompressing only those files. Is there some small implementation of a subset of Zstandard: say no more than four or five source files, and 64K of machine code?
If I add 500K to the program, I'm going to have to save 500K by compressing the accompanying files, just to begin breaking even.
For comparison, libz may not have the decompression speed or the ratio, but it's more than four times smaller:
This is more like it! Text 60511. Is there some configuration of Zstandard which can cram into comparable code space?