I'm surprised to see so many comments here throwing up on the use of JSON. Have we really forgotten the Unix philosophy?
> Write programs to handle text streams, because that is a universal interface.
A lot of the data formats I've written recently have used JSON. Partly because of that aforementioned principal, but mostly because it's just easier and there are very little downsides. Almost every language has support for it so it's fairly universal, it's self documenting, it's easy to debug, easy to manipulate by hand, and easy to maintain.
Put more simply: I've implemented several data formats using custom binary and several data formats using JSON. JSON was easier and faster every single time.
My recommendation for data formats: just use JSON; unless you have a really good reason not to.
And I do mean really good reason. For example, many might think concerns about data size would be a reason not to use JSON. But go ahead and run some JSON through a compression algorithm some time. You'll be amazed. Compressed JSON is very competitive versus a custom binary format.
> several data formats using custom binary and several data formats using JSON
You may not realize it, but you've presented a false dichotomy here.
The alternate to JSON isn't "custom binary", it's standardized binary encodings that allow the same flexibility (or additional flexibility) compared to JSON. Some examples include cbor & msgpack.
>> Write programs to handle text streams, because that is a universal interface.
cryptsetup manipulates block devices. I don't think that line can or should really be applied to something that is fairly similar to a filesystem's on-disk format.
> Alowing Trim passthrough on SSDs without impacting vulnerability.
Unfortunately I don’t think that’s possible. If I scan your encrypted disk and see that 10% of blocks are zeroed out, I can assume that your disk is 90% full. So information has been leaked, which is arguably a vulnerability.
This is an engineering problem where tradeoffs have to be made. E.g. You could make it appear that the disk is about 90% full all the time if you queue up the trim commands in some deterministic way. This way you only could determine if the disk is more or less than about 90% full at any time. I think good engineers would come up with something even better.
This would be on a level of a nice CS Bachelors/Masters thesis.
I think TRIM is dangerous not because of information leaks, but because of the risk of IV/nonce reuse when SSD data blocks are unmapped but not cleared. This would pose a risk if someone dumps the raw content of the NAND chips and find two or more data blocks encrypted with the same IV/nonce.
Why does trim make a difference though? You're not going to scan the whole disk on each write for duplicates, so you need to guarantee statistically-unique nonces either way, or make sure reuse doesn't matter. Trim doesn't make this any worse/better.
After some time, I think I get it. If the key/iv is location-specific, trim may result in an abandoned block which will then be recreated somewhere else. This results in two blocks from the same logical location in two different flash locations. Unless I misunderstand something, the xts mode encryption uses location-based keys.
Fedora decided with release 27 to enable trim by default on newly created encrypted devices. I was not happy about this decision and try to make some noise but nobody really seem to care. There is a reason that it is disabled by default in the Linux kernel, and making this kind of decision on behalf of the users without any input from the community is pretty fucked up.
Personally, I would like to explore the idea of a secure enclave that keeps a map of which blocks are in use that gets referred to during write operations. This seems like a problem that is going to need to be solved with hardware.
Is there really plausible deniability about use of a hard drive that isn't just all zeros (as, AFAIK, they come from the factory)?
I mean, I guess is depends on what burden of proof is needed in the case, certainly if it were balance of probability I'd be surprised if having the disk in a computer looking like it has been written to is sufficient to not make it plausible to deny.
LUKS2 format and features
~~~~~~~~~~~~~~~~~~~~~~~~~
The LUKS2 is an on-disk storage format designed to provide simple key
management, primarily intended for Full Disk Encryption based on dm-crypt.
The LUKS2 is inspired by LUKS1 format and in some specific situations (most
of the default configurations) can be converted in-place from LUKS1.
The LUKS2 format is designed to allow future updates of various
parts without the need to modify binary structures and internally
uses JSON text format for metadata. Compilation now requires the json-c library
that is used for JSON data processing.
On-disk format provides redundancy of metadata, detection
of metadata corruption and automatic repair from metadata copy.
NOTE: For security reasons, there is no redundancy in keyslots binary data
(encrypted keys) but the format allows adding such a feature in future.
NOTE: to operate correctly, LUKS2 requires locking of metadata.
Locking is performed by using flock() system call for images in file
and for block device by using a specific lock file in /run/lock/cryptsetup.
This directory must be created by distribution (do not rely on internal
fallback). For systemd-based distribution, you can simply install
scripts/cryptsetup.conf into tmpfiles.d directory.
For more details see LUKS2-format.txt and LUKS2-locking.txt in the docs
directory. (Please note this is just overview, there will be more formal
documentation later.)
To me, the worst part is that it introduces a library dependency, which itself can introduce security issues unless they intend to audit it. I don't under this choice at all.
I agree! Perhaps the userspace administrative tool (which links json-c) parses it and converts it to something more like an ioctl struct for the kernel.
I see no reason why it can't be both, it's just that BSON happens to be a poor format for general interchange, and if you went back and redesigned you could probably fix many of its flaws. I'm a little skeptical though, I would rather just throw it out.
It's kind of shitty that it was called BSON, because it falls so far short of the generality that JSON has.
What the hell? Why would they do this? The only (tenuous) justification for using a parsed human-oriented text format in any protocol is so that humans can edit things by hand, but this will presumably not be the case for file metadata.
I don’t want to read too much into this since there might be a sane explanation, but this seriously makes me question the design of this security-critical system.
JSON is a language agnostic format that is widely supported, so one could read/parse the metadata from virtually any language. This is much more flexible and powerful than a specific binary format. It may also make debugging a lot easier since it is human readable.
I don't know if that's the reason they chose JSON, but that's what I would think about IIWM.
> so one could read/parse the metadata from virtually any language.
This isn’t a use case for file system metadata. It only makes sense internally anyway. People interact with filesystem metadata through standardized APIs, not metadata dumps.
> It may also make debugging a lot easier since it is human readable.
It’s also basically guaranteed to introduce a ton of bugs; parsing and generating JSON is orders of magnitude more complicated than generating and reading an unambiguous tagged binary format.
JSON seems like a weird choice for a kernel-parsed data structure. Or perhaps it's just parsed in userspace and signaled to the kernel in a more direct format.
> Write programs to handle text streams, because that is a universal interface.
A lot of the data formats I've written recently have used JSON. Partly because of that aforementioned principal, but mostly because it's just easier and there are very little downsides. Almost every language has support for it so it's fairly universal, it's self documenting, it's easy to debug, easy to manipulate by hand, and easy to maintain.
Put more simply: I've implemented several data formats using custom binary and several data formats using JSON. JSON was easier and faster every single time.
My recommendation for data formats: just use JSON; unless you have a really good reason not to.
And I do mean really good reason. For example, many might think concerns about data size would be a reason not to use JSON. But go ahead and run some JSON through a compression algorithm some time. You'll be amazed. Compressed JSON is very competitive versus a custom binary format.