On the other hand, lots of tiny writes scattered all over will tend to produce m...

rogers18445 · on Feb 4, 2023

Those writes would have to be significantly smaller than the SSD's page (sector) size which is 512 bytes or 4 KiB. And would have to be written to different pages in rapid succession (to be flushed apart) - a standard serial write wouldn't trigger this even if it's 1 byte at a time, the OS FS cache would buffer it.

It would have to be very misbehaving software or deliberate sabotage.

wtallis · on Feb 4, 2023

The logical block size presented by SSDs to the host system is 512 bytes or rarely 4kB. But the native page size of the flash memory itself is usually more like 16kB, and the erase block size is several MB at a minimum. Those larger sizes are why random writes (and especially random overwrites) can cause high write amplification within the SSD: because what looks like a series of single-sector writes to the host will at a minimum cause fragmentation within the SSD, and can easily cause large read-modify-write operations within the SSD.

Normally, SSDs and operating systems both use aggressive caching to combine writes. That's the only way a drive can turn in extremely high random write benchmark numbers. Consumer SSDs do this caching even though they do not have power loss protection capacitors to ensure that data cached in volatile SRAM will be flushed to the flash in an emergency. But it wouldn't be smart for the caching to wait forever for more writes to combine with a sub-page write, which is why I'd be concerned that a slow and steady trickle of write activity may be able to cause serious real write amplification.

vlovich123 · on Feb 4, 2023

I’m pretty sure SSDs can only do 4kib aligned writes regardless of the FS sector size (under the hood it’s a write amplification unless the OS or controller manage to coalesce them. But yea, it depends on how things are getting flushed, but generally I wouldn’t expect too much magic unless you get lucky. It sounds like a small bug in the OS (ie these kinds of wires should be matched in memory in the application).

wtallis · on Feb 4, 2023

Almost all SSDs internally track allocations in a 4kB granularity. That size is what leads to the convention of equipping the drive with 1GB of DRAM for every 1TB of NAND flash, when the drive is designed to hold the entire table of logical to physical address mappings in DRAM.

It's now common for consumer SSDs to have less DRAM than the normal 1GB per 1TB ratio, but they run their FTL with the same 4kB granularity and just don't have the full lookup table in RAM. There are at least a handful of special-purpose enterprise drives that use a larger sector size in their FTL, such as the 32kB used by WD's SN340: https://www.anandtech.com/show/14723

rogers18445 · on Feb 4, 2023

I do wonder if perhaps the good NVME SSD controllers come with magic. It would take a single instance of malware ruining SSD's with 4000x write amplification to taint some brands while aiding the marketing of others.

donmcronald · on Feb 4, 2023

I thought some of them even do 8KB. I’ve seen ZFS tips that claim you should use 8KB blocks on things like an 850 Pro.

vlovich123 · on Feb 4, 2023

Not familiar with that. I know QLC disks have a block size of 64kib.