-
Notifications
You must be signed in to change notification settings - Fork 54
Enable persistent Ray cluster state via external Redis GCS fault tolerance #821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable persistent Ray cluster state via external Redis GCS fault tolerance #821
Conversation
4ebb3dd
to
8b1fbbd
Compare
great work! lgtm |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #821 +/- ##
==========================================
+ Coverage 92.41% 92.45% +0.04%
==========================================
Files 24 24
Lines 1397 1419 +22
==========================================
+ Hits 1291 1312 +21
- Misses 106 107 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding the tests!
b552de7
to
785d47d
Compare
/lgtm |
/override kubernetes |
@laurafitzgerald: Overrode contexts on behalf of laurafitzgerald: kubernetes In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: kramaranya <[email protected]>
Signed-off-by: kramaranya <[email protected]>
785d47d
to
24a8950
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/override kubernetes
@kryanbeane: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chipspeak, kryanbeane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue link
RHOAIENG-11115
What changes have been made
Provided Ray cluster head pod persistency through GCS fault tolerance.
Added new config options:
enable_gcs_ft
,redis_address
,redis_password_secret
andexternal_storage_namespace
Added unit tests to cover gcs fault tolerance
Verification steps
ray list actors
from RayCluster head podray list actors
- you should see previously created actorChecks