-
Notifications
You must be signed in to change notification settings - Fork 46
RHOAIENG-50060: Create accelerator-specific LLMInferenceServiceConfig templates with distinguishing labels #1130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
bf13d3b
ef9fe27
c84a07d
cf177dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| apiVersion: serving.kserve.io/v1alpha2 | ||
| kind: LLMInferenceServiceConfig | ||
| metadata: | ||
| name: kserve-config-llm-template-amd-rocm | ||
| annotations: | ||
| openshift.io/display-name: vLLM AMD ROCm GPU LLMInferenceServiceConfig | ||
| description: vLLM AMD ROCm GPU LLMInferenceServiceConfig for LLMInferenceService. | ||
| spec: | ||
| template: | ||
| containers: | ||
| - name: main | ||
| image: placeholder |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| apiVersion: serving.kserve.io/v1alpha2 | ||
| kind: LLMInferenceServiceConfig | ||
| metadata: | ||
| name: kserve-config-llm-template-ibm-spyre-ppc64le | ||
| annotations: | ||
| openshift.io/display-name: vLLM IBM Spyre ppc64le LLMInferenceServiceConfig | ||
| description: vLLM IBM Spyre ppc64le LLMInferenceServiceConfig for LLMInferenceService. | ||
| spec: | ||
| template: | ||
| containers: | ||
| - name: main | ||
| image: placeholder | ||
| env: | ||
| - name: HF_HOME | ||
| value: /tmp/hf_home | ||
|
Comment on lines
+1
to
+15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Find all IBM Spyre LLM template config files
fd ".*spyre.*config.*template.*yaml" --type fRepository: opendatahub-io/kserve Length of output: 286 🏁 Script executed: #!/bin/bash
# Read each variant file to compare env var configurations
echo "=== ppc64le ===" && cat config/overlays/odh/accelerators/ibm-spyre-ppc64le-config-llm-template.yaml
echo -e "\n=== s390x ===" && cat config/overlays/odh/accelerators/ibm-spyre-s390x-config-llm-template.yaml
echo -e "\n=== x86 ===" && cat config/overlays/odh/accelerators/ibm-spyre-x86-config-llm-template.yamlRepository: opendatahub-io/kserve Length of output: 2543 🏁 Script executed: #!/bin/bash
# Check for any TODO/FIXME/comments about ppc64le config or IBM Power
rg -i "ppc64le|ibm.*power" --type yaml -B 2 -A 2 config/overlays/odh/accelerators/
rg -i "TODO|FIXME" config/overlays/odh/accelerators/ibm-spyre-ppc64le-config-llm-template.yamlRepository: opendatahub-io/kserve Length of output: 1259 🏁 Script executed: #!/bin/bash
# Check for README or documentation about accelerators configs
fd "README|ACCELERATOR" --type f config/overlays/odh/accelerators/
# Also search for any documentation directory
fd "doc|docs" --type d config/overlays/odh/accelerators/
# Check if there are any comments or docs at top-level about accelerators
rg -i "accelerator|spyre" README.md --type markdown -A 5 -B 5 2>/dev/null || echo "No match in README"Repository: opendatahub-io/kserve Length of output: 83 ppc64le template lacks Spyre-specific env vars defined in s390x and x86 variants—confirm if intentional or document as incomplete. The s390x variant defines 8 environment variables ( 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| apiVersion: serving.kserve.io/v1alpha2 | ||
| kind: LLMInferenceServiceConfig | ||
| metadata: | ||
| name: kserve-config-llm-template-ibm-spyre-s390x | ||
| annotations: | ||
| openshift.io/display-name: vLLM IBM Spyre s390x LLMInferenceServiceConfig | ||
| description: vLLM IBM Spyre s390x LLMInferenceServiceConfig for LLMInferenceService. | ||
| spec: | ||
| template: | ||
| containers: | ||
| - name: main | ||
| image: placeholder | ||
| env: | ||
| - name: HF_HOME | ||
| value: /tmp/hf_home | ||
| - name: FLEX_DEVICE | ||
| value: VF | ||
| - name: TOKENIZERS_PARALLELISM | ||
| value: "false" | ||
| - name: DTLOG_LEVEL | ||
| value: error | ||
| - name: TORCH_SENDNN_LOG | ||
| value: CRITICAL | ||
| - name: VLLM_SPYRE_USE_CB | ||
| value: "1" | ||
| - name: VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS | ||
| value: "1" | ||
| - name: TORCH_SENDNN_CACHE_ENABLE | ||
| value: "1" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| apiVersion: serving.kserve.io/v1alpha2 | ||
| kind: LLMInferenceServiceConfig | ||
| metadata: | ||
| name: kserve-config-llm-template-ibm-spyre-x86 | ||
| annotations: | ||
| openshift.io/display-name: vLLM IBM Spyre x86 LLMInferenceServiceConfig | ||
| description: vLLM IBM Spyre x86 LLMInferenceServiceConfig for LLMInferenceService. | ||
| spec: | ||
| template: | ||
| containers: | ||
| - name: main | ||
| image: placeholder | ||
| env: | ||
| - name: HF_HOME | ||
| value: /tmp/hf_home | ||
| - name: FLEX_COMPUTE | ||
| value: SENTIENT | ||
| - name: FLEX_DEVICE | ||
| value: PF | ||
| - name: TOKENIZERS_PARALLELISM | ||
| value: "false" | ||
| - name: DTLOG_LEVEL | ||
| value: error | ||
| - name: TORCH_SENDNN_LOG | ||
| value: CRITICAL | ||
| - name: VLLM_SPYRE_WARMUP_BATCH_SIZES | ||
| value: "4" | ||
| - name: VLLM_SPYRE_WARMUP_PROMPT_LENS | ||
| value: "1024" | ||
| - name: VLLM_SPYRE_WARMUP_NEW_TOKENS | ||
| value: "256" | ||
| - name: VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS | ||
| value: "0" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| apiVersion: kustomize.config.k8s.io/v1beta1 | ||
| kind: Kustomization | ||
|
|
||
| commonLabels: | ||
| opendatahub.io/config-type: accelerator | ||
|
|
||
| resources: | ||
| - nvidia-cuda-config-llm-template.yaml | ||
| - amd-rocm-config-llm-template.yaml | ||
| - ibm-spyre-s390x-config-llm-template.yaml | ||
| - ibm-spyre-x86-config-llm-template.yaml | ||
| - ibm-spyre-ppc64le-config-llm-template.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| apiVersion: serving.kserve.io/v1alpha2 | ||
| kind: LLMInferenceServiceConfig | ||
| metadata: | ||
| name: kserve-config-llm-template-nvidia-cuda | ||
| annotations: | ||
| openshift.io/display-name: vLLM NVIDIA CUDA GPU LLMInferenceServiceConfig | ||
| description: vLLM NVIDIA CUDA GPU LLMInferenceServiceConfig for LLMInferenceService. | ||
| spec: | ||
| template: | ||
| containers: | ||
| - name: main | ||
| image: placeholder |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ resources: | |
| # - ../../crd/full/localmodel | ||
| - user-cluster-roles.yaml | ||
| - network-policies.yaml | ||
| - accelerators/ | ||
|
|
||
| components: | ||
| - ../../components/kserve | ||
|
|
@@ -55,6 +56,49 @@ replacements: | |
| fieldPaths: | ||
| - spec.template.spec.containers.[name=manager].image | ||
|
|
||
| - source: | ||
| kind: ConfigMap | ||
| name: kserve-parameters | ||
| fieldpath: data.kserve-llm-d-nvidia-cuda | ||
| targets: | ||
| - select: | ||
| kind: LLMInferenceServiceConfig | ||
| name: kserve-config-llm-template-nvidia-cuda | ||
| fieldPaths: | ||
| - spec.template.containers.[name=main].image | ||
|
|
||
| - source: | ||
| kind: ConfigMap | ||
| name: kserve-parameters | ||
| fieldpath: data.kserve-llm-d-amd-rocm | ||
| targets: | ||
| - select: | ||
| kind: LLMInferenceServiceConfig | ||
| name: kserve-config-llm-template-amd-rocm | ||
| fieldPaths: | ||
| - spec.template.containers.[name=main].image | ||
|
|
||
| - source: | ||
| kind: ConfigMap | ||
| name: kserve-parameters | ||
| fieldpath: data.kserve-llm-d-ibm-spyre | ||
| targets: | ||
| - select: | ||
| kind: LLMInferenceServiceConfig | ||
| name: kserve-config-llm-template-ibm-spyre-s390x | ||
| fieldPaths: | ||
| - spec.template.containers.[name=main].image | ||
| - select: | ||
| kind: LLMInferenceServiceConfig | ||
| name: kserve-config-llm-template-ibm-spyre-x86 | ||
| fieldPaths: | ||
| - spec.template.containers.[name=main].image | ||
| - select: | ||
| kind: LLMInferenceServiceConfig | ||
| name: kserve-config-llm-template-ibm-spyre-ppc64le | ||
| fieldPaths: | ||
| - spec.template.containers.[name=main].image | ||
|
Comment on lines
+81
to
+100
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: fd -g 'params.env' --exec grep -n 'kserve-llm-d-ibm-spyre\|kserve-llm-d-nvidia-cuda\|kserve-llm-d-amd-rocm' {}Repository: opendatahub-io/kserve Length of output: 483 🏁 Script executed: # First, verify the file structure and see the context around lines 81-100
fd -g 'kustomization.yaml' | grep odh | head -5Repository: opendatahub-io/kserve Length of output: 199 🏁 Script executed: # Search for any README or documentation about image architecture support
fd -g 'README*' -o -g '*.md' | xargs grep -l 'spyre\|s390x\|ppc64le' 2>/dev/null | head -10Repository: opendatahub-io/kserve Length of output: 181 🏁 Script executed: # Look for any other references to architecture-specific images or configs
rg 's390x|ppc64le' -t yaml -t md | head -20Repository: opendatahub-io/kserve Length of output: 1338 🏁 Script executed: cat -n config/overlays/odh/kustomization.yaml | sed -n '75,105p'Repository: opendatahub-io/kserve Length of output: 1137 🏁 Script executed: # Check if there's an x86 variant
fd -g '*ibm-spyre*' -o -g '*x86*' config/overlays/odh/accelerators/Repository: opendatahub-io/kserve Length of output: 181 🏁 Script executed: # Also check what's in the accelerators directory
ls -la config/overlays/odh/accelerators/ | grep ibmRepository: opendatahub-io/kserve Length of output: 334 🏁 Script executed: cat config/overlays/odh/accelerators/ibm-spyre-s390x-config-llm-template.yamlRepository: opendatahub-io/kserve Length of output: 974 🏁 Script executed: cat config/overlays/odh/accelerators/ibm-spyre-x86-config-llm-template.yamlRepository: opendatahub-io/kserve Length of output: 1118 Verify that a single Spyre image is correct for all three architectures. All three Spyre variants (s390x, x86, ppc64le) pull their container image from the same 🤖 Prompt for AI Agents |
||
|
|
||
| configMapGenerator: | ||
| - envs: | ||
| - params.env | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.