RHOAI-ENG-37605 - adding content on IBM Z and IBM Z s390 servingRunti… #1053

chtyler · 2025-11-09T17:09:28Z

…me for Tech Preview release at 3.0

Description

Adds content for the technology preview release of the IBM Spyre accelerator on IBM Z technology with the IBM Z (s390) ServingRuntime.

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Documentation
- Added IBM Z (s390x) support documentation for IBM Spyre AI Accelerators alongside existing x86 platform coverage.
- Provided platform-specific prerequisites, configuration requirements, and deployment examples for both x86 and IBM Z architectures.

…me for Tech Preview release at 3.0

coderabbitai · 2025-11-09T17:09:37Z

Walkthrough

The changes extend documentation across three AsciiDoc modules to include support for IBM Z (s390x) alongside existing x86 platform guidance for IBM Spyre AI Accelerators. Updates include section headers, prerequisites, runtime configurations, and deployment procedures specific to each platform.

Changes

Cohort / File(s)	Summary
IBM Z (s390x) Support Expansion `modules/configuring-an-inference-service-for-spyre.adoc`, `modules/deploying-models-on-the-single-model-serving-platform.adoc`, `modules/model-serving-runtimes-for-accelerators.adoc`	Added platform-specific prerequisites, runtime configurations, and deployment guidance; updated section headers to reflect both x86 and IBM Z support; introduced vLLM Spyre s390x ServingRuntime entries with Technology Preview notices, continuous batching limitations, and s390x-specific runtime args (`--max_model_len`, `--max-num-seqs=4`, `--tensor-parallel-size`); added platform-specific toleration keys (`ibm.com/spyre_pf` for x86, `ibm.com/spyre_vf` for s390x); duplicated guidance blocks for both upstream and non-upstream contexts

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Verify consistency of vLLM Spyre s390x ServingRuntime naming and configurations across all three files
Confirm accuracy of s390x-specific runtime arguments and their correct syntax
Check consistency of platform-specific toleration keys throughout all modules
Validate that all existing x86 content remains unchanged

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main changes in the PR, which involve adding documentation content for IBM Z and IBM Z s390x ServingRuntime support across multiple documentation files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7dc427 and 879024b.

📒 Files selected for processing (3)

modules/configuring-an-inference-service-for-spyre.adoc (4 hunks)
modules/deploying-models-on-the-single-model-serving-platform.adoc (1 hunks)
modules/model-serving-runtimes-for-accelerators.adoc (2 hunks)

🔇 Additional comments (8)

modules/model-serving-runtimes-for-accelerators.adoc (2)

64-72: Appropriately updated section header and Technology Preview notices for dual-platform support.

The section header and IMPORTANT blocks are consistently updated to reflect IBM Z support alongside x86. The messaging aligns across all three files in the PR.

90-96: IBM Z documentation paragraph properly structured and consistent.

The new s390x guidance mirrors the x86 paragraph structure and includes proper conditional guards for both upstream and non-upstream contexts. The Spyre Operator image reference and hardware profile documentation links match the x86 variant.

Confirm that the Spyre Operator image link (ibm-aiu/spyre-operator/688a1121575e62c686a471d4) supports both x86 and IBM Z (s390x) architectures, or if separate image variants are available for each platform.

modules/configuring-an-inference-service-for-spyre.adoc (5)

10-10: Technology Preview notices correctly updated for both platforms.

The notices in lines 10 and 13 consistently reference both x86 and IBM Z support, aligning with updates in the other two files reviewed.

Also applies to: 13-13

28-30: Platform-specific prerequisites clearly documented.

The prerequisites now explicitly differentiate between x86 and IBM Z (s390x) runtimes, making it clear which runtime users should deploy depending on their target platform. This aligns with the supported runtimes documented in modules/model-serving-runtimes-for-accelerators.adoc.

59-72: x86 YAML example is clear and complete.

The example for vLLM IBM Spyre AI Accelerator on x86 platforms correctly shows the minimal required configuration: schedulerName set to spyre-scheduler and a single toleration with key ibm.com/spyre_pf. The syntax is valid and matches standard Kubernetes toleration patterns.

74-95: IBM Z YAML example includes necessary configuration details with clear inline comments.

The example for vLLM IBM Spyre s390x on IBM Z platforms properly includes:

Continuous batching arguments in the containers[0].args section

Clear comments explaining the purpose of each argument

Consistent schedulerName and toleration configuration with the x86 example

Correct toleration key ibm.com/spyre_vf for IBM Z VF mode

The YAML is syntactically correct. However, verify the following:

Should the runtime argument values (e.g., --max_model_len=2048) be presented as required fixed values, or should users be guided to adjust these based on their specific models?

Is --tensor-parallel-size=1 always appropriate for IBM Z, or can users configure this differently?

Should the example include guidance on when/how to adjust these parameters?

These minor clarifications would help users deploy models correctly without trial-and-error configuration. Consider adding a note such as: "Adjust --max_model_len and --tensor-parallel-size according to your model size and available IBM Z resources."

44-56: Discrepancies found between documentation snippet and current product capabilities—review as potential inaccuracy.

The documentation states IBM Z "only supports continuous batching," but Spyre does not currently support paged attention or continuous batching, requiring that after prefilling a batch, decoding continues until all sequences finish (as of October 2024 RFC). This appears contradictory.

Additionally:

Missing environment configuration: Continuous batching is enabled via environment variable VLLM_SPYRE_USE_CB=1, not solely through runtime arguments. The snippet omits this critical step.

Incomplete runtime requirements: Configuration requires VLLM_SPYRE_DYNAMO_BACKEND=sendnn and other environment variables; the snippet lists CLI arguments but no mention of environment variables.

Toleration keys unverified: ibm.com/spyre_pf and ibm.com/spyre_vf could not be confirmed through official documentation searches.

Recommend verifying whether the continuous batching statement and specific argument defaults (lines 48–51) align with the current Spyre product release (October 2025) and consult the official IBM Spyre documentation or Red Hat OpenShift AI Kubernetes stack integration guide for accurate configuration details.

modules/deploying-models-on-the-single-model-serving-platform.adoc (1)

114-138: Documentation links are correctly formatted and consistent.

The links in lines 133 and 136 have been verified against existing usage in the codebase. Both follow established patterns: the RHOAI link uses {rhoaidocshome}{default-format-url}/working_with_accelerators/working-with-hardware-profiles_accelerators, and the upstream link uses {odhdocshome}/working-with-accelerators/#working-with-hardware-profiles_accelerators. Identical patterns appear in other documentation modules (e.g., configuring-workload-management-with-kueue.adoc), confirming consistency and correctness.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

RHOAI-ENG-37605 - adding content on IBM Z and IBM Z s390 servingRunti…

879024b

…me for Tech Preview release at 3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RHOAI-ENG-37605 - adding content on IBM Z and IBM Z s390 servingRunti… #1053

RHOAI-ENG-37605 - adding content on IBM Z and IBM Z s390 servingRunti… #1053

chtyler commented Nov 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RHOAI-ENG-37605 - adding content on IBM Z and IBM Z s390 servingRunti… #1053

Are you sure you want to change the base?

RHOAI-ENG-37605 - adding content on IBM Z and IBM Z s390 servingRunti… #1053

Conversation

chtyler commented Nov 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chtyler commented Nov 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 9, 2025 •

edited

Loading