All running their DNS on AWS. My guess is that AWS is seeing a massive flood of failed and retried DNS requests for facebook properties, similar to what jgrahamc mentions here for Cloudflare: https://twitter.com/jgrahamc/status/1445066136547217413
There's such a thing called the "Thundering Herd" problem, that partially matches.
From wiki: the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win.
I can't see how this is the reason for HN to take 10 seconds for the response of the main page (I mean, the URL fetched from the address bar, not the subrequests the page does), as everything else downloads immediately.
The DNS entries should be cached by the browser (and the middleware), so that this problem should only happen once, but I get this constantly.
Also, I sometimes get an error message from HN, which seems to indicate that this is some backend issue which fails gracefully with a custom "We're having some trouble serving your request. Sorry!" on top of a 502 code.
It feels more like there is something else still broken.
Dropping that many BGP routes will have its high latency toll on the whole internet backbone for minutes/hours, I'm not surprised. I wonder if the recent LE's DST Root CA X3 deprecation has something to do with the outage (some DC internal tool/API not accessible because its certificate is expired or something like that).
AWS punishes its sysadmin teams for any downtime so there is heavy incentive to not report unless there is a community shaped gun pointed at your head. This is not a universal problem.
AWS punishes its sysadmin teams for any downtime so there is heavy incentive to not report unless there os a community shaped gun pointed at your head. This is not a universal problem.