Don’t the hyperscaled cloud providers run totally segmented networks? What’s sto...

wmf · on Oct 31, 2022

Google already does that.

vlovich123 · on Oct 31, 2022

I’m not aware of them using something other than TCP internally (I’m sure by now they’ve migrated to QUIC but I’m not sure that QUIC necessarily solves some of the scaling challenges / optimizes for gRPC and low latency).

uluyol · on Oct 31, 2022

Google is using remote memory accesses rather than TCP for at least some classes of traffic (e.g. a caching system). They've been publishing details about how it all works too.

Also, they have a transport (Pony express) developed specifically for RPCs, rather than byte streams or datagrams.

Links: https://research.google/pubs/pub51341/, https://research.google/pubs/pub50590/, https://research.google/pubs/pub48630/, more generally https://research.google/pubs/?area=networking

Phelinofist · on Oct 31, 2022

Can someone ELI5 how remote memory access works?

vlovich123 · on Nov 2, 2022

I could be wrong but I believe they have a unified address space. There’s dedicated hardware that then owns a given memory range. On an access it will fetch it from the remote location matching that address on demand and store it in real memory in space allocated to it. Presumably it evicts stuff if there’s insufficient memory. Once the memory is brought over either a virtual address range is remapped to point to main memory or the ASIC just has a TLB itself.

This is pure speculation based on seeing the word ASIC in one of the summaries but it seems like it could be reasonable.

kccqzy · on Oct 31, 2022

I don't think Google-internal communications happen over gRPC. Maybe the protocol was design with an ambition to replace their internal RPC system but it probably failed at that.

They have a new system called Snap although judging from the paper I don't think it can completely replace TCP: https://research.google/pubs/pub48630/ My understanding is that Snap enables new use cases including moving functionality previously done via RPCs to RDMA-like one-sided operations. I think it is complement to RPCs but does not replace it.

ajb · on Oct 31, 2022

They do, it's called DCTCP. Although it's actually an open standard.