Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AWS, Azure, and GCP respond to cloud report (cockroachlabs.com)
233 points by awoods187 on March 12, 2020 | hide | past | favorite | 34 comments


What a great marketing strategy - publishing interesting, valuable research in an adjacent area - thereby gaining genuine, valuable attention from exactly the type of person who'd also be interested in their product, whilst also establishing a kind of thought leadership in the "cloud performance" category. I'm not even being cynical - great work.


I actually really appreciate companies that produce research and write articles on technologies adjacent to their business. It's a great way of building trust and a community around the business' core product. Digital Ocean are a good example. Reminds me of Patrick Collison mentioning on a podcast [0] that one of the biggest drivers of new customers to Stripe in their early days was a blog they wrote on using the Python debugger.

[0] https://tim.blog/2018/12/20/patrick-collison/


It saddens me moderately that cynicism is often the default; people sometimes think I'm being (rudely) sarcastic when I'm just being straightforwardly nice.

I feel like cynicism/pessimism gets weird praise, too. The anthropologist Wade Davis considers it an indulgence and I wholeheartedly agree.

/tangent


> Azure offers a large number of configuration options that can be tricky to get right

That's basically the summary of Azure. I think all that complexity ends up hurting more than it helps, not just on the end user side, but in terms of lower reliability from the provider side.


Windows is the same way: absurd numbers of options. I would not be surprised if Windows 10 had over a million configuration settings between regular panels, policies, and the registry.


You think Windows has more configuration options than Linux?


Not that, no. But usually Microsoft/Windows limits some arbitrary stuff when configuring something else. The report reads a lot like (for at least the Azure section) „if you configure option A, option B will be enabled because option C will be unavailable. Therefore we provide 10 different options of each A, B and C you can combine however you want“.


If you count installing different packages to interact with your system, no. But in the Windows 10 distro there's certainly more configurability than Kubuntu for example.


Absurd compared to what?


Compared to what humans can reasonably support.



This is a new post that updates the previous post with cloud vendor responses.


That's clear! Adding links to previous threads is just a service to the curious. Not intended to imply dupiness. If it were a dupe we'd have downweighted it and probably marked it as [dupe] in the title.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


Your link to the previous discussion was valuable, but I originally misunderstood the meaning. I've been reading HN long enough that I should know better.

However, for newbies (or the scatterbrained), perhaps you could prefix the link with something like: "For context, see discussion of earlier, related post: https://..."


I do that sometimes (see the search link upthread) but still haven't figured out the best way to word it. It would be nice to avoid being so repetitive.


Not sure there is a general solution. Thanks for your moderation and input!


Thanks Cockroach team!

I'm very excited about the possibility of moving to you guys, especially since it's a drop-in replacement for Postgres. The one thing holding us back today is postgis replacement. We rely heavily on it for location calculations and route path saving.


Working on it.


Azure also offers ephemeral OS disks that can speed up OS perf. This probably won't make any difference in a database test (unless they store their data ephemerally too, which...). But still, another thing to consider.


For me as a developer and CTO, it's still a huge pain figuring out how to properly configure and scale AWS, Azure, or GCP. In-depth reports on "How to configure your AWS EC2 for max. performance" would be great, one report each for a specific use case, e.g. realtime chat app, basic apache webserver, nodejs web app, ... If you think about it, there are only a handful common use cases.


As someone who gets questions about this often, I wonder why nobody seems to know how to answer these questions for themselves, or even how they'd begin to research them. There seems to be a dearth of developers who understand what bottlenecks their applications are likely to encounter, what options exist for profiling and analyzing their performance, or that they may need to read something and learn something new about their systems and the systems they interact with.

Everyone wants an easy button, and that might exist for something that's very well-established with a large community of users, at which point it probably just exists as an AWS service. For many things, you have to do research.


I did this kind of research in the last two weeks. I have got a huge spreadsheets with goals and various options how to fulfil these goals with the cloud offerings of just aws and gcp.

So the fun part here is, several options are viable, until you find this one issue where it doesn't work anymore. then you have to track back or find work around and check if the work arounds are acceptable or not.

And some of these limitations are usually not that straight forward to see and also to read about.

However, there are some resources like https://github.com/ahmetb/cloud-run-faq which are very good and helpful. Sometime official documentation doesn't really cover the questions that are important for your product.

Also, what i found without a multiples poc testing connectivity and the basic building blocks it is difficult to find all the gotchas, that might turn into blockers for a particular solution.


My guess is this could be because of the rates of change. Azure today may have significantly diverged from last year. K8s and the like including a churn of their own.


The gold standard would be: I just click "add new server", and you start off with some basic, general compute server. The server then itself figures out the optimal network, EC2, Storage, ... and all the other parameters by itself (just by knowing what general use case you're aiming for [see above])



That'd be amazing. However servers don't see all use cases, especially ones that will break it, until it gets broken. This is why anomaly detection also only goes so far.

Worse, you're changing the equation constantly when you work on your features.

Finally when these things do happen, the server will protect itself in the interim. When it does that, things will be broken.


Is there a reliability/availability report that covers multiple months (VM uptime, PaaS services, etc)?

I've personally seen the reliability differences of services in Azure vs AWS for example.


If you’re concerned about reliability and your major concern is “VM uptime”, you’ve got bigger issues. You need redundancy and automatic failover at every level based on your cost benefit analysis.


Since you're one of the authors: TPC_RR is a typo.


Disclosure: I work on Google Cloud (and also commented on Cockroach’s original draft post, that this is a follow up to).

Edit: I am a chump, I misread TPC_RR as TCP_RR and responded. Leaving this here for shame. Thanks to jsolson for pointing out my mistake.

I’m not sure if the post was edited, but TCP_RR is definitely not a typo. You may be more used to seeing TCP_CRR which opens a new connection and then a round trip, but for raw network latency netperf’s TCP_RR benchmark is probably the best tool available.


There is still a TPC_RR occurrence in the post.


(editor here) we fixed it. I'm so used to seeing TPC-C, that my eyes missed the "TCP".


What a load of nonsense. So cloud provides X doesn’t like the test because the defaults of their services do not provide the best performance and on top of that the benchmark used doesn’t present them with the result that shows them in the best possible light.

It’s like listening to failed beauty pageant contenders


The idea that you can provide a default option that is the best tuned for every workload is a very naive one.

There are many changes I would make to increase performance for one workload that would have a deleterious effect on another. An easy example is the dirty ratio in Linux - depending on the speed of your local storage, what the size of your working set of memory is, how frequently the data in the working set changes, keeping the settings the same across workloads and systems could be disastrous - it could result in extended periods where you are stuck in a synchronous flush to disk and block all other IO. That same setting on another server might be perfectly acceptable and prevent having to go to slower block storage devices sooner than necessary, increasing overall performance.

It's the same with how you configure your servers - you can throw more spindles at sequential workloads and have great performance, but a random workload really should be using flash storage, etc. etc. etc.

Most people strive to provide sane defaults that strike a good balance for the majority of workloads. This is going to be beneficial to the largest number of people. It's totally fair for someone to provide feedback on this sort of thing, and give details on how things could be further optimized to fit a specific workload. Defaults are not best practices - they are a starting point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: