I can confirm, HN, GitHub and Slack are very slow for me as well. Google is very...

pilsetnieks · on Oct 4, 2021

All running their DNS on AWS. My guess is that AWS is seeing a massive flood of failed and retried DNS requests for facebook properties, similar to what jgrahamc mentions here for Cloudflare: https://twitter.com/jgrahamc/status/1445066136547217413

throwdecro · on Oct 4, 2021

Is there a "Kessler syndrome" analogue for the internet, where failures beget failures until it's just an impenetrable cloud of fail, forever?

nashadelic · on Oct 4, 2021

There's such a thing called the "Thundering Herd" problem, that partially matches.

From wiki: the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win.

motoboi · on Oct 4, 2021

Until someone smashes the "SEND MOAR SERVERS" button.

qwertox · on Oct 4, 2021

I can't see how this is the reason for HN to take 10 seconds for the response of the main page (I mean, the URL fetched from the address bar, not the subrequests the page does), as everything else downloads immediately.

The DNS entries should be cached by the browser (and the middleware), so that this problem should only happen once, but I get this constantly.

Also, I sometimes get an error message from HN, which seems to indicate that this is some backend issue which fails gracefully with a custom "We're having some trouble serving your request. Sorry!" on top of a 502 code.

It feels more like there is something else still broken.

pilsetnieks · on Oct 4, 2021

In the case of HN it's probably just heavier load than normal. It's much faster if you're logged out.

gcoguiec · on Oct 4, 2021

Dropping that many BGP routes will have its high latency toll on the whole internet backbone for minutes/hours, I'm not surprised. I wonder if the recent LE's DST Root CA X3 deprecation has something to do with the outage (some DC internal tool/API not accessible because its certificate is expired or something like that).

alexellisuk · on Oct 4, 2021

Also slow here. I can't see anything on the AWS Service Dashboard https://status.aws.amazon.com

sph · on Oct 4, 2021

In my experience, any service dashboard is useless unless the problem has been going on for so long (i.e. hours) that it is obvious something's wrong.

erhk · on Oct 4, 2021

AWS punishes its sysadmin teams for any downtime so there is heavy incentive to not report unless there is a community shaped gun pointed at your head. This is not a universal problem.

erhk · on Oct 4, 2021

AWS punishes its sysadmin teams for any downtime so there is heavy incentive to not report unless there os a community shaped gun pointed at your head. This is not a universal problem.

yk · on Oct 4, 2021

People either have to work, creating load on GitHub, or waste their time elsewhere, creating load on HN and Slack.

szundi · on Oct 4, 2021

People probably got more time to work.

ggerules · on Oct 4, 2021

Also slow for me also.