Skip to content

Conversation

@seanlaii
Copy link
Contributor

@seanlaii seanlaii commented Dec 24, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

  1. Add the interpreter for RayService.
  2. Add the interpreter tests for RayJob/RayCluster/RayService with the updated test framework.

Which issue(s) this PR fixes:

Part of #6952

Special notes for your reviewer:

I only added the tests for aggregateStatus, component and health operations.
I can also help add the tests for dependency if needed.

Does this PR introduce a user-facing change?:

@karmada-bot karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Dec 24, 2025
@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 24, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @seanlaii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces and enhances the resource interpretation capabilities for Ray.io custom resources within a Kubernetes environment. It provides new Lua-based interpretation rules for RayService to accurately identify its constituent components, assess its health status, aggregate status information from distributed deployments, and determine its external dependencies. Additionally, it significantly expands the test coverage for component and health interpretation across RayCluster, RayJob, and RayService resource types, ensuring reliable and accurate operational insights.

Highlights

  • New RayService Interpreter: A new ResourceInterpreterCustomization has been added for RayService resources, enabling detailed interpretation of its components, health status, status aggregation, and external dependencies using Lua scripts.
  • Expanded Test Coverage for RayService: Comprehensive test cases have been introduced for RayService to validate its component and health interpretation across various configurations, including different worker group setups and serve configurations.
  • Enhanced Test Coverage for RayJob and RayCluster: Existing test suites for RayJob and RayCluster health and component interpretation have been significantly expanded, improving coverage for a wider range of operational scenarios and edge cases.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces new resource interpreter customizations and associated test data for Ray.io API resources, specifically RayCluster, RayJob, and RayService. For RayCluster and RayJob, new test cases were added to interpretcomponent-test.yaml to cover various configurations of head and worker groups, including those with custom names, no explicit names, and zero replicas. The interprethealth-test.yaml files for both RayCluster and RayJob were significantly expanded to include comprehensive health interpretation scenarios, defining specific conditions for healthy and unhealthy states, such as head readiness, cluster provisioning, replica failures, job deployment status, and job completion status. For RayService, a new customizations.yaml file was added, defining Lua scripts for componentResource (extracting head and worker groups from rayClusterConfig), healthInterpretation (based on the Ready condition), statusAggregation (combining conditions, serve endpoints, and active/pending service statuses across multiple clusters), and dependencyInterpretation (identifying ConfigMaps, Secrets, ServiceAccounts, and PVCs from pod templates). Review comments highlight the need to correct a field name from applicationStatuses to ApplicationStatuses in the statusAggregation Lua script for RayService to ensure consistency with the Ray API specification and to remove temporary FIX comments.

@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.56%. Comparing base (2dac564) to head (db198e2).
⚠️ Report is 24 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7042   +/-   ##
=======================================
  Coverage   46.55%   46.56%           
=======================================
  Files         700      700           
  Lines       48084    48091    +7     
=======================================
+ Hits        22384    22392    +8     
- Misses      24016    24017    +1     
+ Partials     1684     1682    -2     
Flag Coverage Δ
unittests 46.56% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XiShanYongYe-Chang
Copy link
Member

@seanlaii Thanks We are planning to do this. ref #6952

@seanlaii seanlaii force-pushed the rayservice-interpreter branch 4 times, most recently from f1ea2ee to 80c768a Compare December 30, 2025 22:41
@seanlaii seanlaii marked this pull request as ready for review December 30, 2025 22:42
@karmada-bot karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 30, 2025
@seanlaii
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive ResourceInterpreterCustomization definitions for RayService and expands existing ones for RayCluster and RayJob. For RayService, new Lua scripts are added to interpret components (head and worker groups with resource requests), aggregate status (combining conditions, serve endpoints, and application statuses from multiple clusters, handling upgrade scenarios), interpret health (based on the 'Ready' condition), and identify dependencies (ConfigMaps, Secrets, ServiceAccounts, PVCs) within the rayClusterConfig's pod templates. For RayCluster and RayJob, the AggregateStatus Lua scripts are updated to explicitly set desiredObj.spec = nil to prevent the spec from being included in aggregated status. Additionally, all three Ray resource types (RayCluster, RayJob, RayService) receive significantly expanded and clarified test cases for AggregateStatus, InterpretComponent, and InterpretHealth functions, covering various healthy and unhealthy states, single/multiple status items, and specific failure modes. A review comment points out an unused upgradeInProgress variable in the RayService health interpretation script, suggesting its removal for clarity.

@seanlaii seanlaii force-pushed the rayservice-interpreter branch from 80c768a to 83ce045 Compare December 30, 2025 22:56
@XiShanYongYe-Chang
Copy link
Member

Thanks!
/assign

desiredObj.status = {}
end
desiredObj.spec = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I feel this could be promoted. For aggregateStatus, the spec is useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and it is easier to write tests.

image: rayproject/ray:2.9.0
operation: InterpretHealth
output:
healthy: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help add an empty line at the end of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added.

applicationStatuses:
api-service:
status: DEPLOYING
message: "API service is deploying on new cluster" No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks for the review!

@seanlaii seanlaii force-pushed the rayservice-interpreter branch from 83ce045 to db198e2 Compare January 7, 2026 02:50
@karmada-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from xishanyongye-chang. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants