In my experience, for the best balance between compression speed and compression ratio, nothing beats 7zip with the right options:
-mmt=$(nproc) # use all available cores
-ms=off # disable solid archives (compress each file separately)
-m0=lzma2 # lzma2 has better threading than lzma1
-md=64m # dictionary size
-ma=0 # "fast" mode
-mmf=hc4 # hash chain match finder
-mfb=64 # number of "fast bits"
-mf=off # disable filters
The biggest gains are: 1) using all available cores, 2) setting the match finder (the binary tree match finders are terribly slow; I haven't played much with the newer patricia tree match finders), 3) disabling solid archives (this seems to cause 7zip to distribute the work more evenly between cores, though it still may only use a few cores if there are many small files), 4) using "fast" mode (whatever that is, it gives a noticeable performance boost and doesn't seem to affect compression ratio much).
Every few years I try zstd and others, and for the data I work with (primarily a mix of json and fixed-width-field binary data), lots of tools beat 7zip out of the box, but they fall short of 7zip with the above command-line options.
A comparable zstd call that uses a 64 MB window size and all cores is:
zstd --long=26 -T0
From there you can tune the compression level, or increase the window size up to 2 GB (--long=31). zstd won't beat the compression of xz, but it can compress much faster if you trade off some space.
Perhaps it's the use of a dictionary? As far as I'm aware, tar, zstd, xz do not use one by default, it's an extra set of hoops to create a training set, create the dictionary, use it for compression, and pack it away somewhere so that it's available for decompression, and then actually use it for decompression. If that's all being done by 7zip just by passing -md=64m that's pretty cool.
Edit: Ahh, I was confused. Neither require a separate training step. Zstd offers an option to do a training step. Both always use dictionaries with a default size that can optionally be changed.
$ time 7zr a -mmt=$(nproc) -ms=off -m0=lzma2 -md=64m -ma=0 -mmf=hc4 -mfb=64 -mf=off linux-5.0.8.tar{.7z,}
real 60.49 user 158.94 sys 3.06 maxrss 8995040
$ stat -c '%s %n' linux-5.0.8.tar.7z
127700475 linux-5.0.8.tar.7z
$ time 7zr e -so linux-5.0.8.tar.7z >/dev/null
real 14.09 user 13.96 sys 0.12 maxrss 282208
Basically:
- it took twice the time to compress data even compared to xz -2 (which also uses lzma2 under the hood),
- it is comparable to zstd/bzip2 ratio-wise,
- it used almost 6 times (!) more RAM than even zstd -12 --long,
- it only used about 2.5 CPU cores out of 4 while compressing (which aligns pretty well with your reasoning for using -ms=off).
----
But hey, source code is not that regular. Since you mentioned JSON and fixed-width-field binary data, I decided to re-run benchmarks on 10M lines of nginx access logs: they're way more regular in their structure (repetitive URLs, timestamps, Mozilla/5.0, stuff like that) that might benefit from larger window sizes.
$ time lbzip2 -k access-log-10m.log
real 90.59 user 313.04 sys 18.46 maxrss 117904
$ time ~/zstd-1.4.0/zstd -T0 -k -12 access-log-10m.log -o access-log-10m.log.zst-12
real 77.34 user 277.21 sys 1.55 maxrss 886416
$ time ~/zstd-1.4.0/zstd -T0 -k -12 --long access-log-10m.log -o access-log-10m.log.zst-12-long
real 69.24 user 242.18 sys 1.85 maxrss 1411872
$ time 7zr a -mmt=$(nproc) -ms=off -m0=lzma2 -md=64m -ma=0 -mmf=hc4 -mfb=64 -mf=off access-log-10m.log{.7z,}
real 109.10 user 356.42 sys 4.69 maxrss 9777664
$ stat -c '%s %n' access-log-10m.log* | sort -n
208537395 access-log-10m.log.bz2
231953002 access-log-10m.log.zst-12-long
237566691 access-log-10m.log.zst-12
249412192 access-log-10m.log.7z
3386733539 access-log-10m.log
Now tweaked 7z did better CPU- and time-wise, but it's still behind zst and bz2 on every metric, especially RAM which it requires so much of (literally gigabytes) it becomes impractical in a number of situations. And we needed a pretty regular input (not just some pretty compressible text like source code or Wikipedia dump) to close that gap. So I can't really recommend your suggestion, unless you have some niche input that benefits from that particular set of options (but then, who has time to learn lzma internals and how every option plays with different kinds of input?).
Every few years I try zstd and others, and for the data I work with (primarily a mix of json and fixed-width-field binary data), lots of tools beat 7zip out of the box, but they fall short of 7zip with the above command-line options.