I didn't realize message queues were used for this type of task. I'm assuming you would then also use autoscaling pods that respond to the number of messages in the queue. How do you scale pods fast enough for a messaging application or anything else trying for 100ms or less per operation?
I think over-provisioning is way more common and sane approach that can address the bulk of spikes versus auto-scaling. Especially if you have these big known events (new years day, black friday, ...) where you can over-provision (or controlled auto-scale if you will) for a short window.
My guess is they're doing both.
Anecdote time. I worked at a company where one project was over-provisioned on dedicated hardware and another auto-scaled in the cloud. The over-provisioned project was much cheaper, had significantly better response times and was easier to manage. It was load tested to handle over an order of magnitude more traffic than the all-time-peak and even though fully over-provisioned, it was cheaper than the baseline usage (and slower, and harder to manage) cloud solution.
Do you have any interesting reads about this? Most of the material I find is biased one way or the other by cloud hype or data centers trying to reel back customers.
AWS has gotten much better and closed the gap some. But, your average system will probably see a 4x difference (2x price and 2x performance).
There was a very long period of time where SSDs were commonly available from everyone but cloud vendors. For some workloads (like databases), that resulted in a massive difference.
This is still true for bandwidth. You can look it up yourself. Off the top of my head, in the US, you can find a server with 100mbps dedicated port for < $300. But that AWS bandwidth is over 3K. So that's less than 1/10th AND you get a relatively [relative to what you can get on AWS] powerful server vs just the bandwidth.
Back when the C4 instances were announced, I ran unix bench on them as well as a dedicated i7-4770. You can see the i7 was quite a bit faster, and, if I recall, was less than half the price
I think database workload is still where the average app would see the biggest difference. A properly configured server with a battery backed raid adapter and proper dual network NICs will blow RDS out of the water for raw latency and cost and, most noticeable to me, consistent performance.
Unfortunately, fewer and fewer companies seem to be offering servers with BBU and dual NICs. And those that do are charging more...so that's also helped close the gap. IBM really screwed up. They wanted to compete with AWS so tried to turn Softlayer into an AWS clone, rather than focus on what Softlayer did better and fix the issues (automation and ddos mitigation come to mind).
I don't remember that we did any benchmarks, but I don't doubt that if you look at the raw performance / price it can easily be an order of magnitude more expensive to go to the cloud. But once you take into account all the automation and services that the major cloud providers have, it can get much more cost effective as it saves significant time / expertise for most companies
They may have only mentioned Bitnami images, it's been awhile.
However, I think your rationale begs a question: how much of "the automation and services that major cloud providers have" (scare quotes only for readability) is needed by runners of metal? For instance, autoscaling can be mooted by overprovisioning, which would still incur only a minor cost increase. Multi-region is similarly cheap. A lot of the remainder seem like productized functionality that is fairly implementable locally if desired.
We did two games, one over-provisioned on metal, the other on auto-scaled cloud based infra.
The cost is significantly higher. We ended up with a hybrid to control for cost. But our needs are long-lived sessions which does not fit the elasticity of the cloud model well.
Messaging queues are a core part of a lot of high scale distributed systems (source - Twitter) You want enough queue space to handle the expected volume and then some. Assuming you have that, you don't need to instantly scale instances out to match the amount of messages, you just need to catch up before the queue space runs out.
Message queues (or similar things like Kafka, which isn't quite a proper "message queue") are used for basically everything at this scale. Messages are being passed indirectly. An event happens, it gets popped on a queue, and then the recipients do something with it.
Maybe you'll consider this pedantic, or maybe even wrong, but I think you meant to say "asynchronous" as opposed to "indirectly". I think anyone googling to learn more will get better results.
True, but there is also the store-and-forward subscription model where the message is placed on the queue without an intended recipient. Message passing can also be asynchronous but direct, e.g., Mach messages.
Dynamically allocating hardware to different services makes sense for self managed DCs too. Breaking down cost per service of team is useful, as is seeing how much hardware is necessary for peak and trough I imagine.
I wouldn’t be surprised if on NYE services like chat take priority and get extra resources from background jobs that can wait a few hours.
One way is to over-allocate in the first place. When your spare pool is draining below a watermark, you scale in. Hopefuly there is enough time for that scale event to complete before the pool drains completely.