[Feature] Add timeout for apiserver grpc server#3427
[Feature] Add timeout for apiserver grpc server#3427kevin85421 merged 17 commits intoray-project:masterfrom
Conversation
-e Signed-off-by: machichima <nary12321@gmail.com>
There're three timeout related issues, as far as I am aware of:
|
Got it, I think this PR is working on grpc service timeout, while PR #3350 works on limiting http request? As I saw the comment in PR #3350 that they want to decide a default timeout, I am thinking if the default value here should be the same as what they set? |
yes
No, I don't think they need to be the same |
60 seconds should be a good number ? |
…server-timeout-grpc-server -e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
Yes I think its good for default value. I also add an env variable to let user set the timeout value. |
|
@dentiny PTAL |
| case <-ctx.Done(): | ||
| // Raise error if time out | ||
| if ctx.Err() == context.DeadlineExceeded { | ||
| return nil, fmt.Errorf("grpc server timed out") |
There was a problem hiding this comment.
Can we name the grpc server with KubeRay API server ?
There was a problem hiding this comment.
Sure! Just changed
apiserver/cmd/main.go
Outdated
| _ = flagSet.Set("log_file", *logFile) | ||
| } | ||
|
|
||
| grpcTimeout := 60 * time.Second // Default timeout |
There was a problem hiding this comment.
Are we following mechanisms to define constants or we are adding to each files where we are using it ?
There was a problem hiding this comment.
quickly glancing over the code, we have constants.go for other components (i.e. operator)
https://github.com/ray-project/kuberay/blob/ebb5ba441b0a7f888c17aa5c2d33943084a9a2d9/ray-operator/controllers/ray/utils/constant.go
I usually do this in two ways:
- either place it to constant file, just as what we did for kuberay operator
- the benefit of which is we group all constants in one place, rather than scattered around the codebase
- or define a util function
getGrpcServerTimeoutOrDefaultand have default timeout besides- the benefit of which is it's easy to locate all timeout related functions and features
Our codebase seems to prefer (1).
There was a problem hiding this comment.
Sure, I found that in apiserver, they put constants in config.go, I'll add it here
kuberay/apiserver/pkg/util/config.go
Lines 10 to 14 in a83d3c1
apiserver/cmd/main.go
Outdated
| } | ||
|
|
||
| grpcTimeout := 60 * time.Second // Default timeout | ||
| if timeoutStr := os.Getenv("GRPC_SERVER_TIMEOUT"); timeoutStr != "" { |
There was a problem hiding this comment.
btw why do we use env var instead of flags? I think flags are strictly better in a few ways:
- program checks env variables; for example, bazel uses env to decide whether we could reuse cache
- impose security issue, because env is shared among all processes which could be accessed everywhere
I almost only use env variables when:
- across language boundary
- across process boundary, if no other easier way
There was a problem hiding this comment.
Thanks for the guidance!
The reason why I put it in environment variable instead of flag is because I search through the code base and find they put this (which I think is a bit similar to timeout?) in the environment variable, so I just simply follow what it does
I agree to your points, if there's no other places that need this value, I think I'll just move it to flag instead
apiserver/cmd/main.go
Outdated
| grpcTimeout = timeout | ||
| klog.Infof("gRPC servier timeout set to %v", grpcTimeout) | ||
| } else { | ||
| klog.Warningf("Invalid GRPC_SERVER_TIMEOUT value: %v, using default timeout (60 seconds)", err) |
There was a problem hiding this comment.
use %d to print out default value, in case we change in the future
There was a problem hiding this comment.
Thanks! Just added
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
…server-timeout-grpc-server -e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
|
let me know when it's ready for review, feel free to ping me any time :) |
-e Signed-off-by: machichima <nary12321@gmail.com>
| select { | ||
| case <-time.After(delay): | ||
| return "test_response", h.returnErr | ||
| case <-ctx.Done(): | ||
| var grpcCode codes.Code | ||
| switch ctx.Err() { | ||
| case context.Canceled: | ||
| grpcCode = codes.Canceled | ||
| case context.DeadlineExceeded: | ||
| grpcCode = codes.DeadlineExceeded | ||
| default: | ||
| grpcCode = codes.Unknown | ||
| } | ||
| return nil, status.Error(grpcCode, ctx.Err().Error()) | ||
| } |
There was a problem hiding this comment.
Adding this to mimic the grpc IO handler for testing
There was a problem hiding this comment.
This is automatically updated when running make test
There was a problem hiding this comment.
This is automatically updated when running make test
apiserver/cmd/main.go
Outdated
| } | ||
|
|
||
| grpcTimeout := 60 * time.Second // Default timeout | ||
| if timeoutStr := os.Getenv("GRPC_SERVER_TIMEOUT"); timeoutStr != "" { |
|
Sorry I just found that I didn't submit the review as comment |
|
@dentiny PTAL, Thanks! |
dentiny
left a comment
There was a problem hiding this comment.
LGTM, thank you for the help!
And sorry about the delay
| // ConfigMapClient indicates an expected call of ConfigMapClient | ||
| // ConfigMapClient indicates an expected call of ConfigMapClient. | ||
| func (mr *MockKubernetesClientInterfaceMockRecorder) ConfigMapClient(namespace interface{}) *gomock.Call { | ||
| mr.mock.ctrl.T.Helper() |
There was a problem hiding this comment.
If the mock call setup fails (e.g., wrong argument types), Go's testing output will show the line in your
actual test code where the error originated, rather than pointing you to this ConfigMapClient() method in
the mock recorder.
TIL
| info, | ||
| func(ctx context.Context, req interface{}) (interface{}, error) { | ||
| return tt.handler.Handle(ctx, req) | ||
| return tt.handler.Handle(ctx, req, 0) |
| info, | ||
| func(receivedCtx context.Context, req interface{}) (interface{}, error) { | ||
| return handler.Handle(receivedCtx, req) | ||
| return handler.Handle(receivedCtx, req, 0) |
There was a problem hiding this comment.
nit: add a comment besides constants
…server-timeout-grpc-server -e Signed-off-by: machichima <nary12321@gmail.com>
-e Signed-off-by: machichima <nary12321@gmail.com>
PR description needs to be updated :) |
Thanks! Just updated! |
-grpc_timeout(e.g.-grpc_timeout 30s), default to 60 secondsWhy are these changes needed?
Currently, there's no timeout setting in gRPC server side. This PR added the timeout to ensure resource access doesn't get overloaded in all cases.
Related issue number
Part of #3344
Checks