AWS, Azure, and GCP respond to cloud report

sho · on March 13, 2020

What a great marketing strategy - publishing interesting, valuable research in an adjacent area - thereby gaining genuine, valuable attention from exactly the type of person who'd also be interested in their product, whilst also establishing a kind of thought leadership in the "cloud performance" category. I'm not even being cynical - great work.

eadan · on March 13, 2020

I actually really appreciate companies that produce research and write articles on technologies adjacent to their business. It's a great way of building trust and a community around the business' core product. Digital Ocean are a good example. Reminds me of Patrick Collison mentioning on a podcast [0] that one of the biggest drivers of new customers to Stripe in their early days was a blog they wrote on using the Python debugger.

[0] https://tim.blog/2018/12/20/patrick-collison/

kohtatsu · on March 13, 2020

It saddens me moderately that cynicism is often the default; people sometimes think I'm being (rudely) sarcastic when I'm just being straightforwardly nice.

I feel like cynicism/pessimism gets weird praise, too. The anthropologist Wade Davis considers it an indulgence and I wholeheartedly agree.

/tangent

pwarner · on March 12, 2020

> Azure offers a large number of configuration options that can be tricky to get right

That's basically the summary of Azure. I think all that complexity ends up hurting more than it helps, not just on the end user side, but in terms of lower reliability from the provider side.

api · on March 13, 2020

Windows is the same way: absurd numbers of options. I would not be surprised if Windows 10 had over a million configuration settings between regular panels, policies, and the registry.

cortesoft · on March 13, 2020

You think Windows has more configuration options than Linux?

chrisandchris · on March 15, 2020

Not that, no. But usually Microsoft/Windows limits some arbitrary stuff when configuring something else. The report reads a lot like (for at least the Azure section) „if you configure option A, option B will be enabled because option C will be unavailable. Therefore we provide 10 different options of each A, B and C you can combine however you want“.

lillecarl · on March 13, 2020

If you count installing different packages to interact with your system, no. But in the Windows 10 distro there's certainly more configurability than Kubuntu for example.

NoSorryCannot · on March 13, 2020

Absurd compared to what?

skywhopper · on March 13, 2020

Compared to what humans can reasonably support.

dang · on March 12, 2020

Previous thread: https://news.ycombinator.com/item?id=21804939

biomcgary · on March 12, 2020

This is a new post that updates the previous post with cloud vendor responses.

dang · on March 13, 2020

That's clear! Adding links to previous threads is just a service to the curious. Not intended to imply dupiness. If it were a dupe we'd have downweighted it and probably marked it as [dupe] in the title.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

biomcgary · on March 13, 2020

Your link to the previous discussion was valuable, but I originally misunderstood the meaning. I've been reading HN long enough that I should know better.

However, for newbies (or the scatterbrained), perhaps you could prefix the link with something like: "For context, see discussion of earlier, related post: https://..."

dang · on March 13, 2020

I do that sometimes (see the search link upthread) but still haven't figured out the best way to word it. It would be nice to avoid being so repetitive.

biomcgary · on March 13, 2020

Not sure there is a general solution. Thanks for your moderation and input!

sergiotapia · on March 13, 2020

Thanks Cockroach team!

I'm very excited about the possibility of moving to you guys, especially since it's a drop-in replacement for Postgres. The one thing holding us back today is postgis replacement. We rely heavily on it for location calculations and route path saving.

andreimatei1 · on March 13, 2020

Working on it.

daxfohl · on March 12, 2020

Azure also offers ephemeral OS disks that can speed up OS perf. This probably won't make any difference in a database test (unless they store their data ephemerally too, which...). But still, another thing to consider.

longtermd · on March 13, 2020

For me as a developer and CTO, it's still a huge pain figuring out how to properly configure and scale AWS, Azure, or GCP. In-depth reports on "How to configure your AWS EC2 for max. performance" would be great, one report each for a specific use case, e.g. realtime chat app, basic apache webserver, nodejs web app, ... If you think about it, there are only a handful common use cases.

fivre · on March 13, 2020

As someone who gets questions about this often, I wonder why nobody seems to know how to answer these questions for themselves, or even how they'd begin to research them. There seems to be a dearth of developers who understand what bottlenecks their applications are likely to encounter, what options exist for profiling and analyzing their performance, or that they may need to read something and learn something new about their systems and the systems they interact with.

Everyone wants an easy button, and that might exist for something that's very well-established with a large community of users, at which point it probably just exists as an AWS service. For many things, you have to do research.

je42 · on March 14, 2020

I did this kind of research in the last two weeks. I have got a huge spreadsheets with goals and various options how to fulfil these goals with the cloud offerings of just aws and gcp.

So the fun part here is, several options are viable, until you find this one issue where it doesn't work anymore. then you have to track back or find work around and check if the work arounds are acceptable or not.

And some of these limitations are usually not that straight forward to see and also to read about.

However, there are some resources like https://github.com/ahmetb/cloud-run-faq which are very good and helpful. Sometime official documentation doesn't really cover the questions that are important for your product.

Also, what i found without a multiples poc testing connectivity and the basic building blocks it is difficult to find all the gotchas, that might turn into blockers for a particular solution.

paulryanrogers · on March 13, 2020

My guess is this could be because of the rates of change. Azure today may have significantly diverged from last year. K8s and the like including a churn of their own.

longtermd · on March 13, 2020

The gold standard would be: I just click "add new server", and you start off with some basic, general compute server. The server then itself figures out the optimal network, EC2, Storage, ... and all the other parameters by itself (just by knowing what general use case you're aiming for [see above])

chanandler_bong · on March 13, 2020

https://aws.amazon.com/lightsail/ ?

grogenaut · on March 13, 2020

That'd be amazing. However servers don't see all use cases, especially ones that will break it, until it gets broken. This is why anomaly detection also only goes so far.

Worse, you're changing the equation constantly when you work on your features.

Finally when these things do happen, the server will protect itself in the interim. When it does that, things will be broken.

plasma · on March 12, 2020

Is there a reliability/availability report that covers multiple months (VM uptime, PaaS services, etc)?

I've personally seen the reliability differences of services in Azure vs AWS for example.

scarface74 · on March 13, 2020

If you’re concerned about reliability and your major concern is “VM uptime”, you’ve got bigger issues. You need redundancy and automatic failover at every level based on your cost benefit analysis.

alpb · on March 12, 2020

Since you're one of the authors: TPC_RR is a typo.

boulos · on March 13, 2020

Disclosure: I work on Google Cloud (and also commented on Cockroach’s original draft post, that this is a follow up to).

Edit: I am a chump, I misread TPC_RR as TCP_RR and responded. Leaving this here for shame. Thanks to jsolson for pointing out my mistake.

I’m not sure if the post was edited, but TCP_RR is definitely not a typo. You may be more used to seeing TCP_CRR which opens a new connection and then a round trip, but for raw network latency netperf’s TCP_RR benchmark is probably the best tool available.

alpb · on March 13, 2020

There is still a TPC_RR occurrence in the post.

orangechairs · on March 17, 2020

(editor here) we fixed it. I'm so used to seeing TPC-C, that my eyes missed the "TCP".

timc3 · on March 12, 2020

What a load of nonsense. So cloud provides X doesn’t like the test because the defaults of their services do not provide the best performance and on top of that the benchmark used doesn’t present them with the result that shows them in the best possible light.

It’s like listening to failed beauty pageant contenders

cthalupa · on March 13, 2020

The idea that you can provide a default option that is the best tuned for every workload is a very naive one.

There are many changes I would make to increase performance for one workload that would have a deleterious effect on another. An easy example is the dirty ratio in Linux - depending on the speed of your local storage, what the size of your working set of memory is, how frequently the data in the working set changes, keeping the settings the same across workloads and systems could be disastrous - it could result in extended periods where you are stuck in a synchronous flush to disk and block all other IO. That same setting on another server might be perfectly acceptable and prevent having to go to slower block storage devices sooner than necessary, increasing overall performance.

It's the same with how you configure your servers - you can throw more spindles at sequential workloads and have great performance, but a random workload really should be using flash storage, etc. etc. etc.

Most people strive to provide sane defaults that strike a good balance for the majority of workloads. This is going to be beneficial to the largest number of people. It's totally fair for someone to provide feedback on this sort of thing, and give details on how things could be further optimized to fit a specific workload. Defaults are not best practices - they are a starting point.