Skip to content

tests/ai-conformance: Enable Cluster AutoScaler and Metrics Server#18041

Draft
ameukam wants to merge 1 commit intokubernetes:masterfrom
ameukam:ai-conformance-autoscaler-metrics-server
Draft

tests/ai-conformance: Enable Cluster AutoScaler and Metrics Server#18041
ameukam wants to merge 1 commit intokubernetes:masterfrom
ameukam:ai-conformance-autoscaler-metrics-server

Conversation

@ameukam
Copy link
Member

@ameukam ameukam commented Mar 7, 2026

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 7, 2026
@k8s-ci-robot k8s-ci-robot requested review from hakman and olemarkus March 7, 2026 12:28
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign olemarkus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 7, 2026
@ameukam
Copy link
Member Author

ameukam commented Mar 7, 2026

/test pull-kops-ai-conformance

@hakman
Copy link
Member

hakman commented Mar 7, 2026

@ameukam why do we need CAS?

@ameukam ameukam force-pushed the ai-conformance-autoscaler-metrics-server branch from 34fd82b to 90d82f2 Compare March 7, 2026 16:29
@ameukam
Copy link
Member Author

ameukam commented Mar 7, 2026

@ameukam why do we need CAS?

Cluster-autoscaling and pod autoscaling are MUST for the AI conformance program: https://github.com/cncf/k8s-ai-conformance/blob/c9946a17af926ef75fb58d51bd1a693c581ddc4e/docs/AIConformance-1.35.yaml#L58-L69

@ameukam
Copy link
Member Author

ameukam commented Mar 7, 2026

/test pull-kops-ai-conformance

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@ameukam ameukam force-pushed the ai-conformance-autoscaler-metrics-server branch from 90d82f2 to 1575ef4 Compare March 8, 2026 16:16
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 8, 2026
@ameukam
Copy link
Member Author

ameukam commented Mar 8, 2026

/test pull-kops-ai-conformance

@k8s-ci-robot
Copy link
Contributor

@ameukam: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-ai-conformance 1575ef4 link false /test pull-kops-ai-conformance

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

# cert-manager: required for KubeRay webhooks
echo "Installing cert-manager..."
kubectl apply --server-side -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.2/cert-manager.yaml
kubectl apply --server-side --force-conflicts -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.2/cert-manager.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're enabling cert-manager via cluster.spec.certManager.enabled=true, should we comment out the apply?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question. I think we should do it but I don't know how it's affecting the rest of the workloads deployed.

@justinsb
Copy link
Member

The issue seems to be this from cert-manager cainjector:

E0308 16:38:36.556502       1 sources.go:106] "unable to fetch associated certificate" err="Certificate.cert-manager.io \"serving-cert\" not found" logger="cert-manager" kind="customresourcedefinition" kind="customresourcedefinition" name="rayclusters.ray.io" certificate="ray-system/serving-cert"
I0308 16:38:36.556597       1 reconciler.go:117] "could not find any ca data in data source for target" logger="cert-manager" kind="customresourcedefinition" kind="customresourcedefinition" name="rayclusters.ray.io"
E0308 16:38:37.497955       1 sources.go:106] "unable to fetch associated certificate" err="Certificate.cert-manager.io \"serving-cert\" not found" logger="cert-manager" kind="customresourcedefinition" kind="customresourcedefinition" name="rayclusters.ray.io" certificate="ray-system/serving-cert"

@ameukam
Copy link
Member Author

ameukam commented Mar 12, 2026

The issue seems to be this from cert-manager cainjector:

E0308 16:38:36.556502       1 sources.go:106] "unable to fetch associated certificate" err="Certificate.cert-manager.io \"serving-cert\" not found" logger="cert-manager" kind="customresourcedefinition" kind="customresourcedefinition" name="rayclusters.ray.io" certificate="ray-system/serving-cert"
I0308 16:38:36.556597       1 reconciler.go:117] "could not find any ca data in data source for target" logger="cert-manager" kind="customresourcedefinition" kind="customresourcedefinition" name="rayclusters.ray.io"
E0308 16:38:37.497955       1 sources.go:106] "unable to fetch associated certificate" err="Certificate.cert-manager.io \"serving-cert\" not found" logger="cert-manager" kind="customresourcedefinition" kind="customresourcedefinition" name="rayclusters.ray.io" certificate="ray-system/serving-cert"

Looks like a RBAC issue: https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kops/18041/pull-kops-ai-conformance/2030679170605912064/artifacts/cluster-info/kube-system/cert-manager-cainjector-775cdc5b7d-f2vlv/cert-manager-cainjector.log

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2026
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants