That's not a very useful property, though. Because inter-core memory works on cache-line granularities, packing more than one lock in a cache line is a Bad Idea™. Potentially it allows you to pack more data being protected by a lock with that data... but alignment rules means that you're going to invariably end up spending 4 or 8 bytes (via a regular integer or a pointer) on that lock anyways.
That's typically not true due to the `Mutex<T>` design: the `T` gets padded to its alignment, then placed into the `struct Mutex` along with the signaling byte, and that struct is padded again before being put into the outer struct.
You can avoid this with a `parking_lot::Mutex<()>` or `parking_lot::RawMutex` guarding other contents, but then you need to use `unsafe` because the borrow checker doesn't understand what you're doing.
You could use CAS loops throughout to make your locks "less than one byte" in size, i.e. one byte, or perhaps one machine word, but using the free bits in that byte/word to store arbitrary data. (This is because a CAS loop can implement any read-modify-write operation on atomically sized data. But CAS will be somewhat slower than special-cased hardware atomics, so this is a bad idea for locks that are performance-sensitive.)
Yup, that's what I'm doing - storing the two bits needed for an object's monitor in the same word as its compressed class pointer. The pointer doesn't change over the lock's lifetime.
That's not a very useful property, though. Because inter-core memory works on cache-line granularities, packing more than one lock in a cache line is a Bad Idea™. Potentially it allows you to pack more data being protected by a lock with that data... but alignment rules means that you're going to invariably end up spending 4 or 8 bytes (via a regular integer or a pointer) on that lock anyways.