Pretty much everything is down (checking from the Netherlands). The Cloudflare dashboard itself is experiencing an outage as well.
Not-so-funny thing is that the Betterstack dashboard is down but our status page hosted by Betterstack is up, and we can't access the dashboard to create an incident and let our customers know what's going on.
Yep that's also my experience. Except HN because it does not use *** Cloudflare because it knows it is not necessary. I just wrote a blog titled "Do Not Put Your Site Behind Cloudflare if You Don't Need To" [1].
No, since they're simply too many. For an e-commerce site I work for, we once had an issue where some bad-actor tried to crawl the site to set up scam shops. The list of IPs were way too broad, and the user-agents way too generic or random.
Could you not also use an ASN list like https://github.com/brianhama/bad-asn-list and add blocks of IPs to a blocklist (eg. ipset on Linux)? Most of the scripty traffic comes from VPSs.
Thanks to widespread botnets, most scrapers fall back to using "residential proxies" the moment you block their cloud addresses. Same load, but now you risk accidentally blocking customers coming from similar net blocks.
Blocking ASNs is one step of the fight, but unfortunately it's not the solution.
Hypothetically, as a cyber-criminal, I'd like to thank the blacklist industry for bringing so much money into criminal enterprises by making residential proxies mandatory for all scraping.
Its not one IP to block. Its thousands! And they're also scatter through different ip networks so no simple cidr block is possible. Oh, and just for the fun, when you block their datacenter ips they switch to hundreds of residential network ips.
Yes, they are really hard to block. In the end I switched to Cloudflare to just so they can handle this mess.
Wouldn't it be trivial to just to write a uwf to block the crawler ips?
Probably more effective would be to get the bots to exclude your IP/domain. I do this for SSH, leaving it open on my public SFTP servers on purpose. [1] If I can get 5 bot owners to exclude me that could be upwards of 250k+ nodes mostly mobile IP's that stop talking to me. Just create something that confuses and craps up the bots. With SSH bots this is trivial as most SSH bot libraries and code are unmaintained and poorly written to begin with. In my ssh example look for the VersionAddendum. Old versions of ssh, old ssh libraries and code that tries to implement ssh itself will choke on a long banner string. Not to be confused with the text banner file.
I'm sure the clever people here could make something similar for HTTPS and especially for GPT/LLM bots at the risk of being flagged "malicious".
Belated response as I called it a night over here in sunny Australia!
The image scraping bots are training for generative AI, I'm assuming.
As to why they literally scrape the same images hundreds of thousands of times?
I have no idea!
But I am not special, the bots have been doing it across the internet.
My main difference to other sites is that I operate a Tourism focused SAAS for local organisations and government tourist boards. Which means we have a very healthy amount of images being served per page across our sites.
We also do on the fly transformations for responsive images and formats.
Which is all done through Cloudinary.
The Bytespider bot (Bytedance / TikTok) was the one that was being abusive for me.
Bad actors now have access to tens of thousands of IPs and servers on the fly.
The cost of hardware and software resources these days is absolute peanuts compared to 10 years ago. Cloud services and APIs has made managing them also trivial as hell.
Cloudflare is simply a evolution in response to the other side also having evolved greatly, both legitimate and illegitimate users.
Yes, I never understand this obsession for centralized services like Cloudflare. To be fair though, if our tiny blogs anyway had a hundred or so visitors monthly, does it matter if it had an outage for a day?
Interesting. I've done a lot of manual work to set up a whole nginx layer to properly route stuff through one domain to various self-hosted services, with way to many hard lessons when I started this journey (from trying to do manual setup without docker, to moving onto repeatable setups via docker, etc.).
The setup appears very simple in Caddy - amazingly simple, honestly. I'm going to give it a good try.
Cloudflare explicitly supports customers placing insecure HTTP only sites behind a cloudflare HTTPS.
It's one of the more controversial parts of the business, it makes the fact that the traffic is unencrypted on public networks invisible to the end user.
1. DDOS protection is not the only thing anymore, I use cloudflare because of vast amounts of AI bots from thousands of ASNs around the world crawling my CI servers (bloated Java VMs on very undersized hosts) and bringing them down (granted, I threw cloudflare onto my static sites as well which was not really necessary, I just liked their analytics UX)
2. the XKCD comic is mis-interpreted there, that little block is small because it's a "small open source project run by one person", cloudflare is the opposite of that
3. edit: also cloudflare is awesome if you are migrating hosts, did a migration this past month, you point cloudflare to the new servers and it's instant DNS propagation (since you didnt propagate anything :) )
It’s that time of the year again where we all realize that relying on AWS and Cloudflare to this degree is pretty dangerous but then again it’s difficult to switch at this point.
If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.
Unless you’re say at airport trying to file a luggage claim … or at the pharmacy trying to get your prescription. I think as a community we have a responsibility to do better than this.
I always see such negative responses when HN brings up software bloat ("why is your static site measured in megabytes").
Now that we have an abundance of compute and most people run devices more powerful than the devices that put man on the moon, it's easier than ever to make app bloat, especially when using a framework like Electron or React Native.
People take it personally when you say they write poor quality software, but it's not a personal attack, it's an observation of modern software practices.
And I'm guilty of this, mainly because I work for companies that prioritize speed of development over quality of software, and I suspect most developers are in this trap.
I think we have a new normal now though. Most web devs starting now don't know a world without React/Vue/Solid/whatever. Like, sure you can roll your own HTML site with JS for interactivity, but employers now don't seem to care about that; if you don't know React then don't bother.
You aren’t cloudflare’s customer in these examples. It depends on the companies that are actually paying for and using the service to complain. Odds are that they won’t care on your behalf due to how our society is structured.
Not really sure how our community is supposed to deal with this.
“We” are the ones making the architecture and the technical specs of these services. Taking care for it to still work when your favourite FAANGMC is down seems like something we can help with.
> If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.
Which only shows that chasing five 9s is worthless for almost all web products. The idea is that by relying on AWS or Cloudflare you can push your uptime numbers up to that standard, but these companies themselves are having such frequent outages that customers themselves don't expect that kind reliability from web products.
If I choose AWS/cloudflare and we're down with half of the internet, then I don't even need to explain it to my boss' bosses, because there will be an article in the mainstream media.
If I choose something else, we're down, and our competitors aren't, then my overlords will start asking a lot of questions.
Yup. AWS went down at a previous job and everyone basically took the day off and the company collectively chuckled. Cloudflare is interesting because most execs don’t know about it so I’d imagine they’d be less forgiving. “So what does cloudflare do for us exactly? Don’t we already have aws?”
Or _you_ aren't down, but a third-party you depend on is (auth0, payment gateway, what have you), and you invested a lot of time and effort into being reliable, but it was all for less than nothing, because your website loads but customers can't purchase, and they associate the problem with you, not with the AWS outage.
In reality it is not half of the internet. That is just marketing. I've personally noticed one news site while others were working. And I guess sites like that will get the blame.
Happy to hear anyone's suggestions about where else to go or what else to do in regards to protecting from large-scale volumetric DDoS attacks. Pretty much every CDN provider nowadays has stacked up enough capacity to tank these kind of attacks, good luck trying to combat these yourself these days?
Somehow KiwiFarms figured it out with their own "KiwiFlare" DDOS mitigation. Unfortunately, all of the other Cloudflare-like services seem exceptionally shady, will be less reliable than Cloudflare, and probably share data with foreign intelligence services I have even less trust for than the ones Cloudflare possibly shares them with.
Unfortunately Anubis doesn't help where my pipe to the internet isn't fat enough to just eat up all the bandwidth that the attacker has available. Renting tens of terabits of capacity isn't cheap and DDoS attacks nowadays are in the scale of that. BunnyCDN's DDoS protection is unfortunately too basic to filter out anything that's ever so slightly more sophisticated. Cloudflare's flexibility in terms of custom rulesets and their global pre-trained rulesets (based on attacks they've seen in the past) is imo just unbeatable at this time.
The Bunny Shield is quite similar to the Cloudflare setup. Maybe not 100% overlap of features but unless you’re Twitter or Facebook, it’s probably enough.
I think at the very least, one should plan the ability to switch to an alternative when your main choice fails… which together with AWS and GitHub is a weekly event now.
Why do people on a technical website suggest this? It's literally the same snake oil as Cloudflare. Both have an endgame of total web DRM; they want to make sure users "aren't bots". Each time the DRM is cracked, they will increase its complexity of the "verifier". You will be running arbitrary code in your big 4 browser to ensure you're running a certified big 4 browser, with 10 trillion man hours of development, on an certified OS.
And if you do rule based blocking they just change their approach. I am constantly blocking big corps these days, barely any work with normal bad actors.
What do they even have an spider for? I never saw any actual traffic with source Facebook. I don't understand either, but it's their official IPs, their official bot headers and it behaves exactly like someone who wants my sites down.
Does it make sense? Nah, but is it part of the weird reality we live in. Looks like it
I have no way of contacting Facebook. All I can do is keep complaining on hackernews whenever the topic arrises.
Edit:// Oh and I see the same with Azure, however there I have no list of IPs to verify it's official just because it looks like it.
5 9's is like 7 minutes a year. They are breaking SLAs and impacting services people depend on
Tbh though this is sort of all the other companies fault, "everyone" uses aws and cf and so others follow. now not only are all your chicks in one basket, so is everyone elses. When the basket inevitably falls into a lake....
Providers need to be more aware of their global impact in outages, and customers need to be more diverse in their spread.
These kinds of outages continue to happen and continue to impact 50+% of the internet, yes, they know they have that power, but they dont treat changes as such, so no, they arent aware. Awareness would imply more care in operations like code changes and deployments.
Outages happen, code changes occur; but you can do a lot to prevent these things on a large scale, and they simply dont.
Where is the A/B deployment, preventing a full outage? What about internally, where was the validation before the change, was the testing run against a prodlike environment or something that once resembled prod but hasnt forever?
They could absolutely mitigate impacting the entire global infra in multiple ways, and havent, despite their many outages.
They are aware. They don't want to pay the cost benefit tradeoff. Education won't help - this is a very heavily argued tradeoff in every large software company.
I do think this is tenable as long as these services are reliable. Even though there have been some outages I would argue that they’re incredibly reliable at this point. If though this ever changes the costs to move to a competitor won’t be as simple as pushing a repository elsewhere, especially for AWS. I think that’s where some of the potential danger lies.
> and judging by the HN post age, we're now past minute 60 of this incident.
Huh? It's been back up during most of this time. It was up and then briefly went back down again but it's been up for a while now. Total downtime was closer to 30 minutes
Not saying not to do this to get through, but just as an observation, it’s also the sort of thing that can make these issues a nightmare to remediate, since the outage can actually draw more traffic just as things are warming up, from customers desperate to get through.
I'm already logged in on the cloudflare dashboard and trying to disable the CF proxy, but getting "404 | Either this page does not exist, or you do not have permission to access it" when trying to access the DNS configuration page.
And I got a 504 error (served by CloudFront) on that status page earlier. The error message suggested there may have been a great increase in traffic that caused it.
Maybe that's precisely what Cloudflare did and now their status page is down because it's receiving an unusual amount of traffic that the VPS can't handle.
Could always just use a status page that updates itself. For my side project Total Real Returns [1], if you scroll down and look at the page footer, I have a live status/uptime widget [2] (just an <img> tag, no JS) which links to an externally-hosted status page [3]. Obviously not critical for a side project, but kind of neat, and was fun to build. :)
This is unrelated to the cloudflare incident but thanks a lot for making that page. I keep checking it from time to time and it's basically the main data source for my long term investing.
1- Does GCP also have any outages recently similar to AWS, Azure or CF? If a similar size (14 TB?) DDoS were to hit GCP, would it stand or would it fail?
2- If this DDoS was targeting Fly.io, would it stand? :)
I actually spoke too soon, and accept I have egg on my face!
Apparently prisma's `npm exec prisma generate` command tries to download "engine binaries" from https://binaries.prisma.sh, which is behind... guess what...
So now my CI/CD is broken, while my production env is down, and I can't fix it.
Seems like workers are less affected and maybe betterstack has decided to bypass cloudflare "stuff" for the status pages? (maybe to cut down costs). My site is still up though some GitHub runners did show it failed at certain points.
Pretty sure they went down for a while because I have 4xx errors they returned but apparently it was short-lived. I wonder if their workers infra. failed for a moment and that let to a total collapse of all of their products?
When its back up, do yourself a favour and rent a $5/mo vps in another country from a provider like OVH or Hetzner and stick your status page on that.
"Yes but what if they go down" - it doesnt matter, having it hosted by someone who can be down for the same reason as your main product/service is a recipe for disaster.
Definitely. Tangentially, I encountered 504 Gateway Timeout errors on cloudflarestatus.com about an hour ago. The error page also disclosed the fact that it's powered by CloudFront (Amazon's CDN).
Been using Cachet for quite a while before inevitably migrating to Atlassian's Statuspage.io. I'm a huge fan of self-hosting and self-managing every single thing in existence but Cachet was just such a PITA to maintain and there was just no other good alternative to Cachet that was also open source.
I don't get why you need such a service for a status page with 99.whatever% uptime. I mean, your status page only has to be up if everything else is down, so maybe 1% uptime is fine.
Not-so-funny thing is that the Betterstack dashboard is down but our status page hosted by Betterstack is up, and we can't access the dashboard to create an incident and let our customers know what's going on.
Edit: wording.