You're missing the point that maybe, just maybe, I'm part of a team that looks after >5 million servers.
You might also divine that while TCP can be a problem, a bigger problem is data affinity. Shuttling data from a next door rack costs less than one that's in the next door hall, and significantly less than the datacentre over. With each internal hop, the risk of congestion increases.
You might also divine that changing everything from TCP to a new, untested protocol across all services, with all that associated engineering effort, plus translation latency, might not be worth it. Especially as now all your observability and protocol routing tools don't work.
quick maths: a faster top of rack switch is possibly the same cost as 5 days engineering wage for a mid level google employee. How many new switches do you think you could buy with the engineering effort required to port everything to the new protocol, and have it stable and observable?
As a side note "oh but they are google" is not a selling point. Google has google problems half of which are things related to their performance/promotion system which penalises incremental changes in favour of $NEW_THING. HTTP2.0 was also a largely google effort designed to tackle latency over lossy network connections. which it fundamentally didn't do because a whole bunch of people didn't understand how TCP worked and were shocked to find out that mobile performance was shit.
For future, please write about how typical cloud customers can design for better data affinity.
Or is it just handled by the provider?
FWIW, at a prev gig, knowing nothing about nothing, I finally persuaded our team to colocate a Redis process on each of our EC2 instances (along side the http servers). Quick & dirty solution to meet our PHBs silly P99 requirements (for a bog standard ecommerce site).
> quick maths: a faster top of rack switch is possibly the same cost as 5 days engineering wage for a mid level google employee. How many new switches do you think you could buy with the engineering effort required to port everything to the new protocol, and have it stable and observable?
So your 5M machines / 40 in the best case of all 1U boxes is 125K TOR-switch-SWE-week-equivalents / 52 weeks in a year which comes to 2K SWE-years to invest in new protocols, observability, and testing. Google got to the scale they are by explicitly spending on SWE-hours instead of Cisco.
but to answer your further case. The point is you don't need to replace all the TOR switches. Only the ones that deal with high network IO.
to change protocol you need gateways/loadbalancers either at the edge of the DC just after the public end points, or in the "high speed" areas that are running high network IO. For that to work, you'll need to show its worth the engineering effort/maintenance/latency.
You might also divine that while TCP can be a problem, a bigger problem is data affinity. Shuttling data from a next door rack costs less than one that's in the next door hall, and significantly less than the datacentre over. With each internal hop, the risk of congestion increases.
You might also divine that changing everything from TCP to a new, untested protocol across all services, with all that associated engineering effort, plus translation latency, might not be worth it. Especially as now all your observability and protocol routing tools don't work.
quick maths: a faster top of rack switch is possibly the same cost as 5 days engineering wage for a mid level google employee. How many new switches do you think you could buy with the engineering effort required to port everything to the new protocol, and have it stable and observable?
As a side note "oh but they are google" is not a selling point. Google has google problems half of which are things related to their performance/promotion system which penalises incremental changes in favour of $NEW_THING. HTTP2.0 was also a largely google effort designed to tackle latency over lossy network connections. which it fundamentally didn't do because a whole bunch of people didn't understand how TCP worked and were shocked to find out that mobile performance was shit.