Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Consistent Hashing with Bounded Loads #9239

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kirs
Copy link
Contributor

@kirs kirs commented Oct 31, 2022

This PR implements Consistent Hashing with Bounded Loads algorithm outlined in the paper. The same algorithm is employed by Haproxy and Envoy which made it easier for me to follow an existing implementation and port it into ingress-nginx. This chains nicely to chash and chashsubset balancers that are available in ingress-nginx.

What's the goal?

The goal here is to keep routing requests for the same tenant to same endpoints - as long as that doesn't overload that pod, and if it does, spill over to neighbour pods in the ring.

This is relevant specifically for multi-tenant web apps where each request is annotated with X-Account-Id or whatever and you can leverage better caching in the upstream if same container ends up serving the same account most of the time.

How does it work?

For the specified hash_balance_factor, requests to any upstream host are capped at hash_balance_factor times the average number of requests across the cluster. When a request arrives for an upstream host that is currently serving at its max capacity, linear probing is used to identify an eligible host in the ring. For example, with hash_balance_factor=2, each host will be allowed to burst 2x above the average load in the cluster. If the first pick of consistent hash in the ring is loaded above that threshold, it will move on to the next node in the ring.

How do you track the load?

Same as Envoy and Haproxy, we keep track of active requests in total and per each node. That allows us to do the math according to the strategy from above and to figure if the host is overloaded or not.

How do you know it works?

This approach has been tested at Shopify for the past month and it's been showing good data. This is a backport from our private fork that we'd like to contribute back to upstream.

@ElvinEfendi

@k8s-ci-robot
Copy link
Contributor

@kirs: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 31, 2022
@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 31, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @kirs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority area/lua Issues or PRs related to lua code size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 31, 2022
@kirs kirs force-pushed the chashbl branch 2 times, most recently from 279c336 to 43d9b6e Compare October 31, 2022 21:52
@ElvinEfendi
Copy link
Member

@rikatz @strongjz @tao12345666333 I've already reviewed this in detail in our fork. I'll let you decide if you want to merge this or when do you merge this. But I approve it.

@longwuyuan
Copy link
Contributor

@ElvinEfendi @kirs , is it possible to see the numbers like requests received and routed etc on a kind cluster on a laptop with 16GB RAM 4-6cores, by cloning the fork/branch by @kirs

@kirs
Copy link
Contributor Author

kirs commented Nov 1, 2022

@longwuyuan we can, but I'm curious what kind of data you're expecting to see there?
Are you curious to compare it to other balancers?

In general, this one is pretty specific in terms of packing tenants to be served by same endpoints. We shouldn't expect it to be comparable with EWMA or round-robin because it optimized for different things.

@longwuyuan
Copy link
Contributor

@kirs just wanted to put prometheus-operator and watch requests coming and routed to same pod, in a multiple replica workload, under load

@longwuyuan
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 1, 2022
@longwuyuan
Copy link
Contributor

@kirs , also just checked the commits, am not a developer, so explicitly asking if this occurs quietly for all use-cases or is there a toggle/other switch/aannotation etc. to enable/disable this useful feeature

@tao12345666333
Copy link
Member

/retest


local util = require("util")
local split = require("util.split")
local reverse_table = util.reverse_table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to also include reverse_table implementation in this PR.

local ngx_ERR = ngx.ERR
local ngx_WARN = ngx.WARN
local ngx_log = ngx.log
local tostring = tostring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused.

@kirs
Copy link
Contributor Author

kirs commented Nov 1, 2022

@longwuyuan this feature is opt-in. It is controlled like the rest of load balancers through annotations like:

    nginx.ingress.kubernetes.io/load-balancer: "chashboundedloads"
    nginx.ingress.kubernetes.io/upstream-hash-by: "$arg_predictorid" <- the variable you want to hash by
    nginx.ingress.kubernetes.io/upstream-hash-by-balance-factor: "1.5"

@kirs kirs force-pushed the chashbl branch 2 times, most recently from be9ed06 to ec94c71 Compare November 1, 2022 17:08
@tao12345666333
Copy link
Member

I did a quick review and it looks good.

I will review it in detail as soon as possible, especially the test cases

/assign

@kirs
Copy link
Contributor Author

kirs commented Nov 14, 2022

@tao12345666333 let me know if you have any further questions about tests or implementation 👂

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kirs
Once this PR has been reviewed and has the lgtm label, please ask for approval from tao12345666333 by writing /assign @tao12345666333 in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kirs kirs force-pushed the chashbl branch 3 times, most recently from 01a538c to 56de583 Compare February 2, 2023 17:37
@kirs
Copy link
Contributor Author

kirs commented Feb 2, 2023

I updated the implementation with the latest version that we've been running in production. This one includes the seeding logic.

@ElvinEfendi
Copy link
Member

@kirs it'd also be useful to add an e2e test case to

. It seems straightforward enough to include that in this PR too.

self.requests_by_endpoint[endpoint] = nil
end
end
self.total_requests = self.total_requests - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if self.total_requests is greater than 0?

Copy link
Contributor Author

@kirs kirs Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should never happen according to our tests. The only point I could see is a critical panic-like log, is that what you've been thinking?
If the algorithm is wrong in some way and we've missed a bug, I'd rather let us run with negative total_requests and let something else break more loudly than if we let it quietly never decrement it.

@@ -180,4 +189,45 @@ local function replace_special_char(str, a, b)
end
_M.replace_special_char = replace_special_char

local function get_hostname()
local f = io.popen("/bin/hostname")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the os.getenv("HOSTNAME") function to retrieve the hostname?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that an env variable that is commonly available in Unix? Or is that more to allow testing this easier?

@ElvinEfendi
Copy link
Member

There are two legitimate test failures:

!!! 'gofmt -s' needs to be run on the following files: 
./test/e2e/annotations/upstreamhashby.go
Error: Process completed with exit code 1.
Error:   2023/02/06 13:44:34 [error] 71#71: *336 failed to run balancer_by_lua*: /etc/nginx/lua/balancer/chashboundedloads.lua:228: variable "chashbl_debug" not found for writing; maybe it is a built-in variable that is not changeable or you forgot to use "set $chashbl_debug '';" in the config file to define it first

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@kirs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pre-ingress-nginx-codegen 9eebd83 link true /test pre-ingress-nginx-codegen
pre-ingress-nginx-lua-test 9eebd83 link true /test pre-ingress-nginx-lua-test
pre-ingress-nginx-boilerplate 9eebd83 link true /test pre-ingress-nginx-boilerplate

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lua Issues or PRs related to lua code cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants