Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions config/llmisvcconfig/config-llm-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,12 @@ spec:
- containerPort: 8000
protocol: TCP
command:
- vllm
- serve
- /mnt/models
- /bin/bash
- -c
- |
[ -f /etc/profile.d/ibm-aiu-setup.sh ] && source /etc/profile.d/ibm-aiu-setup.sh
exec vllm serve /mnt/models "$@"
- --
args:
- --served-model-name
- "{{ .Spec.Model.Name }}"
Expand Down
12 changes: 12 additions & 0 deletions config/overlays/odh/accelerators/amd-rocm-config-llm-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-template-amd-rocm
annotations:
openshift.io/display-name: vLLM AMD ROCm GPU LLMInferenceServiceConfig
description: vLLM AMD ROCm GPU LLMInferenceServiceConfig for LLMInferenceService.
spec:
template:
containers:
- name: main
image: placeholder
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-template-ibm-spyre-ppc64le
annotations:
openshift.io/display-name: vLLM IBM Spyre ppc64le LLMInferenceServiceConfig
description: vLLM IBM Spyre ppc64le LLMInferenceServiceConfig for LLMInferenceService.
spec:
template:
containers:
- name: main
image: placeholder
env:
- name: HF_HOME
value: /tmp/hf_home
Comment on lines +1 to +15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all IBM Spyre LLM template config files
fd ".*spyre.*config.*template.*yaml" --type f

Repository: opendatahub-io/kserve

Length of output: 286


🏁 Script executed:

#!/bin/bash
# Read each variant file to compare env var configurations
echo "=== ppc64le ===" && cat config/overlays/odh/accelerators/ibm-spyre-ppc64le-config-llm-template.yaml
echo -e "\n=== s390x ===" && cat config/overlays/odh/accelerators/ibm-spyre-s390x-config-llm-template.yaml
echo -e "\n=== x86 ===" && cat config/overlays/odh/accelerators/ibm-spyre-x86-config-llm-template.yaml

Repository: opendatahub-io/kserve

Length of output: 2543


🏁 Script executed:

#!/bin/bash
# Check for any TODO/FIXME/comments about ppc64le config or IBM Power
rg -i "ppc64le|ibm.*power" --type yaml -B 2 -A 2 config/overlays/odh/accelerators/
rg -i "TODO|FIXME" config/overlays/odh/accelerators/ibm-spyre-ppc64le-config-llm-template.yaml

Repository: opendatahub-io/kserve

Length of output: 1259


🏁 Script executed:

#!/bin/bash
# Check for README or documentation about accelerators configs
fd "README|ACCELERATOR" --type f config/overlays/odh/accelerators/
# Also search for any documentation directory
fd "doc|docs" --type d config/overlays/odh/accelerators/
# Check if there are any comments or docs at top-level about accelerators
rg -i "accelerator|spyre" README.md --type markdown -A 5 -B 5 2>/dev/null || echo "No match in README"

Repository: opendatahub-io/kserve

Length of output: 83


ppc64le template lacks Spyre-specific env vars defined in s390x and x86 variants—confirm if intentional or document as incomplete.

The s390x variant defines 8 environment variables (FLEX_DEVICE, TOKENIZERS_PARALLELISM, DTLOG_LEVEL, TORCH_SENDNN_LOG, VLLM_SPYRE_USE_CB, VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS, TORCH_SENDNN_CACHE_ENABLE) and x86 defines 10; ppc64le only has HF_HOME. If this sparse configuration is intentional for ppc64le hardware, add a comment explaining why these knobs are not needed. If it is a placeholder pending IBM Power team input, add a # TODO: annotation to prevent the gap from being silently merged as final.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/overlays/odh/accelerators/ibm-spyre-ppc64le-config-llm-template.yaml`
around lines 1 - 15, The ppc64le LLMInferenceServiceConfig template (kind
LLMInferenceServiceConfig, spec.template.containers[name: main]) currently only
sets HF_HOME while s390x/x86 variants set Spyre-specific env vars (FLEX_DEVICE,
TOKENIZERS_PARALLELISM, DTLOG_LEVEL, TORCH_SENDNN_LOG, VLLM_SPYRE_USE_CB,
VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS, TORCH_SENDNN_CACHE_ENABLE, etc.);
update this template to either include the same Spyre-specific environment
variables with appropriate ppc64le values or add a clear inline comment/TODO in
the template explaining that the reduced env set is intentional or awaiting IBM
Power team input so the omission is not merged silently (modify
spec.template.containers -> name: main to add the comment/TODO or the env
entries).

Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-template-ibm-spyre-s390x
annotations:
openshift.io/display-name: vLLM IBM Spyre s390x LLMInferenceServiceConfig
description: vLLM IBM Spyre s390x LLMInferenceServiceConfig for LLMInferenceService.
spec:
template:
containers:
- name: main
image: placeholder
env:
- name: HF_HOME
value: /tmp/hf_home
- name: FLEX_DEVICE
value: VF
- name: TOKENIZERS_PARALLELISM
value: "false"
- name: DTLOG_LEVEL
value: error
- name: TORCH_SENDNN_LOG
value: CRITICAL
- name: VLLM_SPYRE_USE_CB
value: "1"
- name: VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS
value: "1"
- name: TORCH_SENDNN_CACHE_ENABLE
value: "1"
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-template-ibm-spyre-x86
annotations:
openshift.io/display-name: vLLM IBM Spyre x86 LLMInferenceServiceConfig
description: vLLM IBM Spyre x86 LLMInferenceServiceConfig for LLMInferenceService.
spec:
template:
containers:
- name: main
image: placeholder
env:
- name: HF_HOME
value: /tmp/hf_home
- name: FLEX_COMPUTE
value: SENTIENT
- name: FLEX_DEVICE
value: PF
- name: TOKENIZERS_PARALLELISM
value: "false"
- name: DTLOG_LEVEL
value: error
- name: TORCH_SENDNN_LOG
value: CRITICAL
- name: VLLM_SPYRE_WARMUP_BATCH_SIZES
value: "4"
- name: VLLM_SPYRE_WARMUP_PROMPT_LENS
value: "1024"
- name: VLLM_SPYRE_WARMUP_NEW_TOKENS
value: "256"
- name: VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS
value: "0"
12 changes: 12 additions & 0 deletions config/overlays/odh/accelerators/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

commonLabels:
opendatahub.io/config-type: accelerator

resources:
- nvidia-cuda-config-llm-template.yaml
- amd-rocm-config-llm-template.yaml
- ibm-spyre-s390x-config-llm-template.yaml
- ibm-spyre-x86-config-llm-template.yaml
- ibm-spyre-ppc64le-config-llm-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-template-nvidia-cuda
annotations:
openshift.io/display-name: vLLM NVIDIA CUDA GPU LLMInferenceServiceConfig
description: vLLM NVIDIA CUDA GPU LLMInferenceServiceConfig for LLMInferenceService.
spec:
template:
containers:
- name: main
image: placeholder
44 changes: 44 additions & 0 deletions config/overlays/odh/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ resources:
# - ../../crd/full/localmodel
- user-cluster-roles.yaml
- network-policies.yaml
- accelerators/

components:
- ../../components/kserve
Expand Down Expand Up @@ -55,6 +56,49 @@ replacements:
fieldPaths:
- spec.template.spec.containers.[name=manager].image

- source:
kind: ConfigMap
name: kserve-parameters
fieldpath: data.kserve-llm-d-nvidia-cuda
targets:
- select:
kind: LLMInferenceServiceConfig
name: kserve-config-llm-template-nvidia-cuda
fieldPaths:
- spec.template.containers.[name=main].image

- source:
kind: ConfigMap
name: kserve-parameters
fieldpath: data.kserve-llm-d-amd-rocm
targets:
- select:
kind: LLMInferenceServiceConfig
name: kserve-config-llm-template-amd-rocm
fieldPaths:
- spec.template.containers.[name=main].image

- source:
kind: ConfigMap
name: kserve-parameters
fieldpath: data.kserve-llm-d-ibm-spyre
targets:
- select:
kind: LLMInferenceServiceConfig
name: kserve-config-llm-template-ibm-spyre-s390x
fieldPaths:
- spec.template.containers.[name=main].image
- select:
kind: LLMInferenceServiceConfig
name: kserve-config-llm-template-ibm-spyre-x86
fieldPaths:
- spec.template.containers.[name=main].image
- select:
kind: LLMInferenceServiceConfig
name: kserve-config-llm-template-ibm-spyre-ppc64le
fieldPaths:
- spec.template.containers.[name=main].image
Comment on lines +81 to +100
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -g 'params.env' --exec grep -n 'kserve-llm-d-ibm-spyre\|kserve-llm-d-nvidia-cuda\|kserve-llm-d-amd-rocm' {}

Repository: opendatahub-io/kserve

Length of output: 483


🏁 Script executed:

# First, verify the file structure and see the context around lines 81-100
fd -g 'kustomization.yaml' | grep odh | head -5

Repository: opendatahub-io/kserve

Length of output: 199


🏁 Script executed:

# Search for any README or documentation about image architecture support
fd -g 'README*' -o -g '*.md' | xargs grep -l 'spyre\|s390x\|ppc64le' 2>/dev/null | head -10

Repository: opendatahub-io/kserve

Length of output: 181


🏁 Script executed:

# Look for any other references to architecture-specific images or configs
rg 's390x|ppc64le' -t yaml -t md | head -20

Repository: opendatahub-io/kserve

Length of output: 1338


🏁 Script executed:

cat -n config/overlays/odh/kustomization.yaml | sed -n '75,105p'

Repository: opendatahub-io/kserve

Length of output: 1137


🏁 Script executed:

# Check if there's an x86 variant
fd -g '*ibm-spyre*' -o -g '*x86*' config/overlays/odh/accelerators/

Repository: opendatahub-io/kserve

Length of output: 181


🏁 Script executed:

# Also check what's in the accelerators directory
ls -la config/overlays/odh/accelerators/ | grep ibm

Repository: opendatahub-io/kserve

Length of output: 334


🏁 Script executed:

cat config/overlays/odh/accelerators/ibm-spyre-s390x-config-llm-template.yaml

Repository: opendatahub-io/kserve

Length of output: 974


🏁 Script executed:

cat config/overlays/odh/accelerators/ibm-spyre-x86-config-llm-template.yaml

Repository: opendatahub-io/kserve

Length of output: 1118


Verify that a single Spyre image is correct for all three architectures.

All three Spyre variants (s390x, x86, ppc64le) pull their container image from the same data.kserve-llm-d-ibm-spyre key, but their LLMInferenceServiceConfig resources define completely different environment variables per architecture. For example, s390x sets FLEX_DEVICE=VF, VLLM_SPYRE_USE_CB=1, and VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS=1, while x86 sets FLEX_COMPUTE=SENTIENT, FLEX_DEVICE=PF, and VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS=0. These architectures (s390x is IBM mainframe, x86 is Intel/AMD) require different CPU instruction sets and cannot share a single binary image. The params.env file defines only one Spyre image key with a specific digest (vllm-spyre-rhel9@sha256:...), which is a single-architecture image, not a multi-arch manifest list. This will cause deployment failures on s390x and ppc64le systems. Define separate image keys for each architecture (kserve-llm-d-ibm-spyre-s390x, kserve-llm-d-ibm-spyre-x86, kserve-llm-d-ibm-spyre-ppc64le) in params.env and update the kustomization.yaml replacement sources accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/overlays/odh/kustomization.yaml` around lines 81 - 100, The
kustomization currently replaces image for three LLMInferenceServiceConfig
resources from a single ConfigMap key (ConfigMap name kserve-parameters,
fieldpath data.kserve-llm-d-ibm-spyre), but the three targets
(LLMInferenceServiceConfig names kserve-config-llm-template-ibm-spyre-s390x,
kserve-config-llm-template-ibm-spyre-x86,
kserve-config-llm-template-ibm-spyre-ppc64le) require distinct
architecture-specific images; update params.env to define three separate keys
(kserve-llm-d-ibm-spyre-s390x, kserve-llm-d-ibm-spyre-x86,
kserve-llm-d-ibm-spyre-ppc64le) and modify this kustomization replacement block
so each target uses the corresponding ConfigMap key instead of the single
data.kserve-llm-d-ibm-spyre entry to ensure the correct arch-specific image is
injected for each LLMInferenceServiceConfig.


configMapGenerator:
- envs:
- params.env
Expand Down
3 changes: 3 additions & 0 deletions config/overlays/odh/params.env
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,8 @@ llmisvc-controller=quay.io/opendatahub/llmisvc-controller:latest
kserve-agent=quay.io/opendatahub/kserve-agent:latest
kserve-router=quay.io/opendatahub/kserve-router:latest
kserve-storage-initializer=quay.io/opendatahub/kserve-storage-initializer:latest
kserve-llm-d-nvidia-cuda=registry.redhat.io/rhaiis/vllm-cuda-rhel9@sha256:fc68d623d1bfc36c8cb2fe4a71f19c8578cfb420ce8ce07b20a02c1ee0be0cf3
kserve-llm-d-amd-rocm=registry.redhat.io/rhaiis/vllm-rocm-rhel9@sha256:d9a48add238cc095fa43eeee17c8c4d104de60c4dc623e0bc7f8c4b53b2b2e97
kserve-llm-d-ibm-spyre=registry.redhat.io/rhaiis/vllm-spyre-rhel9@sha256:80ae3e435a5be2c1f117f36599103ab05357917dd6e37f0df6613cb3ac2c13ea
# TODO update when our changes are introduced in the official image
kube-rbac-proxy=quay.io/opendatahub/odh-kube-auth-proxy@sha256:dcb09fbabd8811f0956ef612a0c9ddd5236804b9bd6548a0647d2b531c9d01b3
Loading