Skip to content

Commit 762cb9a

Browse files
clubandersonclaude
andauthored
🌱 Add WVA CKS nightly workflow + remove duplicate nightly (llm-d#756)
Add CKS-specific nightly E2E caller workflow for WVA on waldorf and remove duplicate nightly-e2e-openshift.yaml that was replaced by the reusable workflow pattern. Signed-off-by: Andy Anderson <andy@clubanderson.com> Signed-off-by: Andrew Anderson <andy@clubanderson.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 59f51f0 commit 762cb9a

1 file changed

Lines changed: 17 additions & 11 deletions

File tree

.github/workflows/nightly-e2e-openshift.yaml renamed to .github/workflows/nightly-e2e-cks.yaml

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
1-
name: Nightly - OpenShift E2E Tests
1+
name: Nightly - CKS E2E Tests
22

3-
# Nightly regression test for WVA on OpenShift.
4-
# Calls the reusable workflow from llm-d/llm-d-infra to deploy the
5-
# workload-autoscaling guide stack and run the e2e test suite.
3+
# Nightly regression test for WVA on CoreWeave Kubernetes (CKS).
4+
# Calls the reusable CKS helmfile workflow from llm-d/llm-d-infra to deploy
5+
# the workload-autoscaling guide stack and run the e2e test suite on waldorf.
66

77
on:
88
schedule:
9-
- cron: '0 0 * * *' # Midnight UTC daily
9+
- cron: '30 6 * * *' # 06:30 UTC daily (staggered from IS CKS at 06:00)
1010
workflow_dispatch:
1111
inputs:
1212
model_id:
1313
description: 'Model ID'
1414
required: false
1515
default: 'unsloth/Meta-Llama-3.1-8B'
1616
accelerator_type:
17-
description: 'Accelerator type (H100, A100, L40S)'
17+
description: 'Accelerator type (H100, H200, A100)'
1818
required: false
19-
default: 'A100'
19+
default: 'H100'
2020
image_tag:
2121
description: 'WVA image tag — "latest" auto-resolves to newest release'
2222
required: false
@@ -46,20 +46,22 @@ permissions:
4646
contents: read
4747

4848
concurrency:
49-
group: nightly-e2e-openshift
49+
group: nightly-e2e-cks-wva
5050
cancel-in-progress: true
5151

5252
jobs:
5353
nightly:
54-
uses: llm-d/llm-d-infra/.github/workflows/reusable-nightly-e2e-openshift.yaml@main
54+
uses: llm-d/llm-d-infra/.github/workflows/reusable-nightly-e2e-cks-helmfile.yaml@main
5555
with:
5656
guide_name: workload-autoscaling
57-
namespace_suffix: nightly-wva
57+
namespace: llm-d-nightly-wva-cks
58+
helmfile_env: istio
59+
gateway_type: istio
5860
caller_repo: ${{ github.repository }}
5961
caller_ref: ${{ github.ref_name }}
6062
deploy_wva: true
6163
model_id: ${{ github.event.inputs.model_id || 'unsloth/Meta-Llama-3.1-8B' }}
62-
accelerator_type: ${{ github.event.inputs.accelerator_type || 'A100' }}
64+
accelerator_type: ${{ github.event.inputs.accelerator_type || 'H100' }}
6365
wva_image_tag: ${{ github.event.inputs.image_tag || 'latest' }}
6466
request_rate: ${{ github.event.inputs.request_rate || '20' }}
6567
num_prompts: ${{ github.event.inputs.num_prompts || '3000' }}
@@ -68,5 +70,9 @@ jobs:
6870
skip_cleanup: ${{ github.event.inputs.skip_cleanup == 'true' }}
6971
required_gpus: 2
7072
recommended_gpus: 4
73+
allow_gpu_preemption: true
74+
pod_wait_timeout: '30m'
75+
pod_readiness_delay: 180
76+
image_override: 'ghcr.io/llm-d/llm-d-cuda-dev:latest'
7177
test_target: test-e2e-openshift
7278
secrets: inherit

0 commit comments

Comments
 (0)