This has to do with how async works without preemption and resource limits. Ther...

sapiogram · 2025-04-12T10:18:37 1744453117

> When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.

As someone who has advocated against Kubernetes CPU limits everywhere I've worked, I'm really struggling to see how they helped you here. The code used 10x less CPU with CPU limits, with no adverse effects? What were all those CPU cycles going before?

evidencetamper · 2025-04-12T15:11:28 1744470688

The adverse effect is higher latency, as the execution gets delayed due to throttling.

Parent probably didn't care about latency.

hosh · 2025-04-14T07:53:12 1744617192

It’s a data pipeline that polls periodically. Average latency matters a lot less than lower variance of latency.

Even when I distribute offsets on polling, I would get these seesaws where enough fights over resources.

marcosdumay · 2025-04-12T20:06:03 1744488363

> The code used 10x less CPU with CPU limits, with no adverse effects?

The normal situation is that defective situations get a much large latency, while the correct requests run much faster.

It's a problem on the cases when the first set isn't actually defective. But it normally takes a reevaluation of the entire thing to solve those, and the non-limited situation isn't any good either.