Skip to content

Conversation

@chtyler
Copy link
Contributor

@chtyler chtyler commented Nov 9, 2025

…me for Tech Preview release at 3.0

Description

Adds content for the technology preview release of the IBM Spyre accelerator on IBM Z technology with the IBM Z (s390) ServingRuntime.

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Documentation
    • Added IBM Z (s390x) support documentation for IBM Spyre AI Accelerators alongside existing x86 platform coverage.
    • Provided platform-specific prerequisites, configuration requirements, and deployment examples for both x86 and IBM Z architectures.

@coderabbitai
Copy link

coderabbitai bot commented Nov 9, 2025

Walkthrough

The changes extend documentation across three AsciiDoc modules to include support for IBM Z (s390x) alongside existing x86 platform guidance for IBM Spyre AI Accelerators. Updates include section headers, prerequisites, runtime configurations, and deployment procedures specific to each platform.

Changes

Cohort / File(s) Summary
IBM Z (s390x) Support Expansion
modules/configuring-an-inference-service-for-spyre.adoc, modules/deploying-models-on-the-single-model-serving-platform.adoc, modules/model-serving-runtimes-for-accelerators.adoc
Added platform-specific prerequisites, runtime configurations, and deployment guidance; updated section headers to reflect both x86 and IBM Z support; introduced vLLM Spyre s390x ServingRuntime entries with Technology Preview notices, continuous batching limitations, and s390x-specific runtime args (--max_model_len, --max-num-seqs=4, --tensor-parallel-size); added platform-specific toleration keys (ibm.com/spyre_pf for x86, ibm.com/spyre_vf for s390x); duplicated guidance blocks for both upstream and non-upstream contexts

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify consistency of vLLM Spyre s390x ServingRuntime naming and configurations across all three files
  • Confirm accuracy of s390x-specific runtime arguments and their correct syntax
  • Check consistency of platform-specific toleration keys throughout all modules
  • Validate that all existing x86 content remains unchanged

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main changes in the PR, which involve adding documentation content for IBM Z and IBM Z s390x ServingRuntime support across multiple documentation files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7dc427 and 879024b.

📒 Files selected for processing (3)
  • modules/configuring-an-inference-service-for-spyre.adoc (4 hunks)
  • modules/deploying-models-on-the-single-model-serving-platform.adoc (1 hunks)
  • modules/model-serving-runtimes-for-accelerators.adoc (2 hunks)
🔇 Additional comments (8)
modules/model-serving-runtimes-for-accelerators.adoc (2)

64-72: Appropriately updated section header and Technology Preview notices for dual-platform support.

The section header and IMPORTANT blocks are consistently updated to reflect IBM Z support alongside x86. The messaging aligns across all three files in the PR.


90-96: IBM Z documentation paragraph properly structured and consistent.

The new s390x guidance mirrors the x86 paragraph structure and includes proper conditional guards for both upstream and non-upstream contexts. The Spyre Operator image reference and hardware profile documentation links match the x86 variant.

Confirm that the Spyre Operator image link (ibm-aiu/spyre-operator/688a1121575e62c686a471d4) supports both x86 and IBM Z (s390x) architectures, or if separate image variants are available for each platform.

modules/configuring-an-inference-service-for-spyre.adoc (5)

10-10: Technology Preview notices correctly updated for both platforms.

The notices in lines 10 and 13 consistently reference both x86 and IBM Z support, aligning with updates in the other two files reviewed.

Also applies to: 13-13


28-30: Platform-specific prerequisites clearly documented.

The prerequisites now explicitly differentiate between x86 and IBM Z (s390x) runtimes, making it clear which runtime users should deploy depending on their target platform. This aligns with the supported runtimes documented in modules/model-serving-runtimes-for-accelerators.adoc.


59-72: x86 YAML example is clear and complete.

The example for vLLM IBM Spyre AI Accelerator on x86 platforms correctly shows the minimal required configuration: schedulerName set to spyre-scheduler and a single toleration with key ibm.com/spyre_pf. The syntax is valid and matches standard Kubernetes toleration patterns.


74-95: IBM Z YAML example includes necessary configuration details with clear inline comments.

The example for vLLM IBM Spyre s390x on IBM Z platforms properly includes:

  • Continuous batching arguments in the containers[0].args section
  • Clear comments explaining the purpose of each argument
  • Consistent schedulerName and toleration configuration with the x86 example
  • Correct toleration key ibm.com/spyre_vf for IBM Z VF mode

The YAML is syntactically correct. However, verify the following:

  1. Should the runtime argument values (e.g., --max_model_len=2048) be presented as required fixed values, or should users be guided to adjust these based on their specific models?
  2. Is --tensor-parallel-size=1 always appropriate for IBM Z, or can users configure this differently?
  3. Should the example include guidance on when/how to adjust these parameters?

These minor clarifications would help users deploy models correctly without trial-and-error configuration. Consider adding a note such as: "Adjust --max_model_len and --tensor-parallel-size according to your model size and available IBM Z resources."


44-56: Discrepancies found between documentation snippet and current product capabilities—review as potential inaccuracy.

The documentation states IBM Z "only supports continuous batching," but Spyre does not currently support paged attention or continuous batching, requiring that after prefilling a batch, decoding continues until all sequences finish (as of October 2024 RFC). This appears contradictory.

Additionally:

  • Missing environment configuration: Continuous batching is enabled via environment variable VLLM_SPYRE_USE_CB=1, not solely through runtime arguments. The snippet omits this critical step.
  • Incomplete runtime requirements: Configuration requires VLLM_SPYRE_DYNAMO_BACKEND=sendnn and other environment variables; the snippet lists CLI arguments but no mention of environment variables.
  • Toleration keys unverified: ibm.com/spyre_pf and ibm.com/spyre_vf could not be confirmed through official documentation searches.

Recommend verifying whether the continuous batching statement and specific argument defaults (lines 48–51) align with the current Spyre product release (October 2025) and consult the official IBM Spyre documentation or Red Hat OpenShift AI Kubernetes stack integration guide for accurate configuration details.

modules/deploying-models-on-the-single-model-serving-platform.adoc (1)

114-138: Documentation links are correctly formatted and consistent.

The links in lines 133 and 136 have been verified against existing usage in the codebase. Both follow established patterns: the RHOAI link uses {rhoaidocshome}{default-format-url}/working_with_accelerators/working-with-hardware-profiles_accelerators, and the upstream link uses {odhdocshome}/working-with-accelerators/#working-with-hardware-profiles_accelerators. Identical patterns appear in other documentation modules (e.g., configuring-workload-management-with-kueue.adoc), confirming consistency and correctness.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant