Home-use is probably the only situation Glacier is good for backup though.
A home user is fine with a 3.5-4 hour window before their backup becomes available for download (as it will probably take them days to download it anyway).
In a corporate environment, I don't want to wait around for 3.5-4 hours before my data even becomes available for restore in a disaster recovery situation.
Seems good for archive-only in a corporate environment (as the name implies).
Interesting to note - those of us who used Iron Mountain/Data Safe for years - 4 Hours was considered a "Premier Rapid Recovery" service that we paid a lot of money for (as recently as 2003, actually).
In a true disaster recovery (Building burned down or is otherwise unavailable) - it usually takes most businesses a week or so just to find new office facilities.
But - agreed, there will be some customers for whom Glacier wouldn't work well for all use cases.
Now - a blended S3/Glacier offering might be very attractive.
Yes, that's what I am thinking: a week of backups to S3 and a script that does moves the oldest one from S3 to Glacier. You would rarely need a backup older than a week if you've already got a week of backups. I'm still figuring out if there is a practical way to do this with incremental backups without introducing to much risk that the backup process gets messed up.
"In the coming months, Amazon Simple Storage Service (Amazon S3) plans to introduce an option that will allow you to seamlessly move data between Amazon S3 and Amazon Glacier using data lifecycle policies."
It's made for archiving, not 'backups'. You're expected to keep most recent backups on-site, so if a RAID array dies or you happen to do something wrong, it just takes minutes to restore. Off-site, or tape archive, it's usually another thing. Most of the data is never, ever accessed (but required by company policy to be kept).
In a corporate environment, I wouldn't want to depend on the cloud as my primary backup solution in the first place. I'd be much more comfortable using it as the offsite mirror of an onsite backup. If you're at a point in disaster recovery where you have to restore from your offsite, you (likely) have bigger problems than a 4-hour wait time.
I personally believe that data should never be deleted (or overwritten), but only appended to. Kinda like what redis/datomic does. So, keep live data onsite, along with an onsite (small) backup, and all the old data in Glacier.
You can believe that, but legal realities dictate otherwise. There are certain classes of information that you are not permitted to keep beyond a defined horizon, either temporal or event-based. Legal compliance with records management processes means having the ability to delete or destroy information such that it cannot be recovered. Note that if the information is encrypted, you can just delete the decryption key and it is effectively deleted.
That's just the beginning - if you've ever been in a ediscovery process, having large amounts of historical data is actually a liability - if instead of 100GB you have 10TB, you'll need to hand that over, and before that, to cull it so you don't inadvertently hand the opposition a huge lever to be used against you. Processing and reviewing 10-100x the data can take inordinate amounts of time more than you expect.
"In a corporate environment, I don't want to wait around for 3.5-4 hours before my data even becomes available for restore in a disaster recovery situation."
I am guessing you work in a company with a good IT department then, I am guessing this is not the average. Many companies I have worked for 4 hours would be a miracle with a 1-3 day operation minimum.
And don't forget that the X00MB type size limits many IT departments puts everywhere it not because getting a TB hard drive is cheap, but because all of the extra backups add to the cost of each new MB. Having another extremely cheap way they could backup large amounts of data (encrypted?) would help to reduce the cost of each extra GB.
I don't think DR is the correct use-case. I think it more likely to be (as in the case of banks which I know best) a regulator asking for electronic documents or emails relating to a transaction from 6 years ago (seven years is the retention requirement).
In that case, 3-4 hours would be more than acceptable.
Here, we are producing logs of simulation runs that may be needed in the next couple of years. Most of the logs will be destroyed unused. This is a perfect use case for glacier. We do not care about 5 hours recovery time.
In a corporate environment where you have low RTO goals for DR scenarios, relying on backups instead of replicated SAN etc is not a sound practice, especially with large data sets.
A home user is fine with a 3.5-4 hour window before their backup becomes available for download (as it will probably take them days to download it anyway).
In a corporate environment, I don't want to wait around for 3.5-4 hours before my data even becomes available for restore in a disaster recovery situation.
Seems good for archive-only in a corporate environment (as the name implies).