This has to do with how async works without preemption and resource limits.
There's a counter-intuitive thing when trying to balance load across resources: applying resource limits helps the system run better overall.
One example: when scaling a web app, there comes a point when scaling up the database doesn't seem to help. So we're tempted to increase the connection pool because that looks like a bottleneck. Increasing the pool can make the overall system perform worse, because often times, it is slow queries and poorly performing queries that is stopping up the system.
Another example: one of the systems I worked on has over 250 node runtimes running on a single, large server. It used pm2 and did not apply cgroups to limit CPU resources. The whole system was a hog, and I temporarily fixed it by consolidating things to run on about 50 node runtimes.
When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.
There's probably some math that folks who know Operations Research can prove all this.
> When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.
As someone who has advocated against Kubernetes CPU limits everywhere I've worked, I'm really struggling to see how they helped you here. The code used 10x less CPU with CPU limits, with no adverse effects? What were all those CPU cycles going before?
> The code used 10x less CPU with CPU limits, with no adverse effects?
The normal situation is that defective situations get a much large latency, while the correct requests run much faster.
It's a problem on the cases when the first set isn't actually defective. But it normally takes a reevaluation of the entire thing to solve those, and the non-limited situation isn't any good either.
There's a counter-intuitive thing when trying to balance load across resources: applying resource limits helps the system run better overall.
One example: when scaling a web app, there comes a point when scaling up the database doesn't seem to help. So we're tempted to increase the connection pool because that looks like a bottleneck. Increasing the pool can make the overall system perform worse, because often times, it is slow queries and poorly performing queries that is stopping up the system.
Another example: one of the systems I worked on has over 250 node runtimes running on a single, large server. It used pm2 and did not apply cgroups to limit CPU resources. The whole system was a hog, and I temporarily fixed it by consolidating things to run on about 50 node runtimes.
When I moved them over to Kubernetes, I also applied CPU resource limit, each in its own pod. I set the limits based on what I measured when they were all running on PM2 ... but the same code running on Kubernetes ran with 10x less CPU overall. Why? Because the async code were not allowed to just run grabbing as much CPU as it can for as long as it can, and the kernel scheduler was able to fairly run. That allowed the entire system to run with less resources overall.
There's probably some math that folks who know Operations Research can prove all this.