I think it means there is a small chance enough hard drives might fail at the sa...

gjm11 · on Aug 21, 2012

Which of course means that (if they're telling the truth) the probability of losing your data mostly comes from really big events: collapse of civilization, global thermonuclear war, Amazon being bought by some entity that just wants to melt its servers down for scrap, etc. (Whose probability is clearly a lot more than 10^-11 per year; the big bang was only on the order of 10^10 years ago.)

justinsb · on Aug 21, 2012

There's some clever wordplay/marketing here... "designed to provide 99.99..99%" means that the theoretical model of the system tells you that you lose 1 in X files per year when everything is working as modeled (e.g. "disks fail at expected rate as independent random variables"). If something not in the model goes wrong (e.g. power goes out, a bug in S3 code), data can be lost above and beyond this "designed" percentage. The actual probability of data loss is therefore much, much higher than this theoretical percentage.

A more comical way to look at it: The percentage is actually AWS saying "to keep costs low, we plan to lose this many files per year"; when we screw up and things don't go quite to plan, we lose a _lot_ more.

Jabbles · on Aug 21, 2012

per object. So although the chance of losing any particular object is tiny, the chance of you losing something is proportional† to the number of objects. Still extremely small.

†roughly proportional if you have << 1e11 objects

gjm11 · on Aug 21, 2012

Yes. Though I bet the real lossage probabilities are dominated by failure events that take out a substantial fraction of all the objects there are, and that happen a lot more often than once per 10^11 years.

metalruler · on Aug 21, 2012

Agreed. More likely a catastrophic and significant loss for a small number of customers rather than a fraction of a percentage of loss for a large number.

Similar deal for hard drive bit error rates, where the quoted average BER may not necessarily accurately represent what can happen in the real world. For example, an unrecoverable read error loses 4096 bits (512 byte sectors) or 32768 bits (4k sectors) all at once, rather than individual bits randomly flipped over a long period.

icebraining · on Aug 21, 2012

More important than the speed of those backups is this:

Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives.