Fix 504 timeout: Add K8s client caching + QPS increase + benchmarks#5430
Fix 504 timeout: Add K8s client caching + QPS increase + benchmarks#5430shovan-mondal wants to merge 16 commits into
Conversation
55066e9 to
801e684
Compare
amityt
left a comment
There was a problem hiding this comment.
Thanks for the changes @shovan-mondal Will definitely help in optimization and scaling. 🚀
|
Great suggestion, @amityt You're completely right, different cluster sizes will need different limits. I will move these to environment variables while keeping 50 and 100 as the sensible defaults. I'll push the updated commit shortly! |
Signed-off-by: shovan-mondal <shovanmondal2004@gmail.com>
Signed-off-by: shovan-mondal <shovanmondal2004@gmail.com>
Signed-off-by: shovan-mondal <shovanmondal2004@gmail.com>
5f20ffd to
b246e85
Compare
|
@amityt I havepushed the settings to environment variables. Now Ready for review |
amityt
left a comment
There was a problem hiding this comment.
Thanks for the changes @shovan-mondal 🚀
Thank You :) |
|
Hey @shovan-mondal Some checks are failing. Could you please look into it? |
|
Hey @shovan-mondal Any updates? |
|
Hi @PriteshKiri , |
Signed-off-by: shovan-mondal <shovanmondal2004@gmail.com>
82813e9 to
1b460ed
Compare
|
Hi @PriteshKiri I have updated it. The checks are now passing. I had to upgrade the Go version from 1.24 to 1.26 for fixing patch security vulnerabilities which was causing build pipeline failure and updated Red Hat base image to version 9.7 to make build more reliable. |
Proposed changes
Fixes #5079 (504 Gateway Timeouts).
This PR addresses a critical concurrency anti-pattern in the subscriber's
GetGenericK8sClient, where a newkubernetes.Clientsetwas being initialized for every single request.The Issue:
The Fix:
sync.Onceto enforce a Singleton pattern for the Kubernetes client (reusing the TCP connection).rest.Configwith QPS=50 and Burst=100 to handle concurrent UI requests without client-side throttling.client_perf_test.goto validate the performance improvement.Benchmark Results:
I ran a parallel benchmark simulating 20 concurrent requests.
Types of changes
What types of changes does your code introduce to Litmus? Put an
xin the boxes that applyChecklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Dependency
Special notes for your reviewer:
I have included a new test file
pkg/k8s/parallel_benchmark_test.gowhich runs the benchmark scenarios shown above. You can verify the fix locally by running:This test proves that the singleton implementation matches the speed of the unthrottled code (~38ms) but maintains a single persistent connection, eliminating the TLS handshake overhead that causes the 504s in production.