fix(storage): update hf-xet to v1.5.0 to resolve HuggingFace download issues#1475
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (2)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds a single runtime dependency, Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/overlays/test-llmisvc/configmap/inferenceservice.yaml`:
- Line 9: The image field currently uses the mutable tag
"kserve/storage-initializer:latest"; change it to a pinned version or
placeholder by replacing the value of the "image" key (the "image" :
"kserve/storage-initializer:latest" entry) with a specific semantic version
(e.g., kserve/storage-initializer:vX.Y.Z) or an explicit placeholder like
{{STORAGE_INITIALIZER_VERSION}} and ensure any deployment templating (e.g., the
SET_KSERVE_VERSION wiring) substitutes that variable so the default is not
":latest".
In `@hack/setup/infra/manage.kserve-kustomize.sh`:
- Around line 191-194: The current block only replaces the literal "latest" tag,
making subsequent runs with a different SET_KSERVE_VERSION a no-op; update the
sed replacement invoked when SET_KSERVE_VERSION is set (the section that logs
via log_info and edits
config/overlays/test-llmisvc/configmap/inferenceservice.yaml) to rewrite the
image tag pattern (replace whatever follows the image name after the colon)
instead of matching the string "latest" so the image tag is idempotently updated
to the new ${SET_KSERVE_VERSION} on every run.
In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 1657-1659: The script injects raw SET_KSERVE_VERSION into a sed
replacement which allows special chars like '/' or '&' to break the
substitution; fix by validating and escaping the value before use: ensure
SET_KSERVE_VERSION is non-empty and matches an allowed pattern (e.g.,
alphanumerics, dot, dash, underscore) and if it fails, log error and exit, then
build a sanitized variable (escape sed metacharacters such as '/' and '&' or use
a different delimiter) and use that sanitized variable in the sed replacement
instead of the raw $SET_KSERVE_VERSION; also quote variable expansions and keep
the log_info call as-is (reference symbols: SET_KSERVE_VERSION, sed -i -e
"s/latest/.../g", log_info).
In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Around line 1243-1246: The current sed invocation that replaces "latest" (in
the block guarded by SET_KSERVE_VERSION and logged via log_info) is too broad
and can change unrelated fields; update the sed command so it only targets image
tag occurrences (e.g., the "image:" line or a colon-separated tag like
":latest") by using a regex that matches the image field or the pattern
"/...:latest" and replace only that "latest" with ${SET_KSERVE_VERSION}; modify
the sed invocation in the same conditional (the block that calls log_info and
uses SET_KSERVE_VERSION) to use that scoped regex so only image tag values are
changed.
In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 1576-1578: The sed invocation blindly replaces every "latest" and
injects ${SET_KSERVE_VERSION} unescaped; instead escape the SET_KSERVE_VERSION
value and restrict the substitution to the kserve/storage-initializer image tag
only. Update the block around SET_KSERVE_VERSION to (1) compute an
escaped_version from SET_KSERVE_VERSION (escape slashes and ampersands), and (2)
run sed that matches the image name token (e.g.,
"kserve/storage-initializer:[^\"']*") and replaces only the tag portion with the
escaped_version in config/overlays/test-llmisvc/configmap/inferenceservice.yaml
so arbitrary "latest" occurrences are not mutated and special chars in the
version do not break the replacement.
In `@python/storage/pyproject.toml`:
- Line 17: Replace the unbounded dependency spec "hf-xet>=1.5.0" with a bounded
range to prevent unexpected upgrades; locate the dependency entry
"hf-xet>=1.5.0" in the pyproject.toml and change it to a stable spec such as
"hf-xet~=1.5.0" or "hf-xet>=1.5.0,<2.0.0" so patch/minor updates are allowed but
breaking major releases are excluded.
In `@test/scripts/gh-actions/update-test-overlays.sh`:
- Line 40: The sed replacement injects ${TAG} unescaped which allows characters
like /, &, or \ to break the command; update the script (the sed invocation on
the line with sed -i -e "s/latest/${TAG}/g" targeting
config/overlays/test-llmisvc/configmap/inferenceservice.yaml) to safely handle
arbitrary tag values by first escaping the TAG variable (escape /, &, and
backslashes) or by using a different sed delimiter and wrapping the variable in
quotes, then use the escaped/quoted variable in the sed expression so the
substitution cannot be used for injection or fail on special characters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: dd4af887-1b83-4967-acc6-97778a36f66d
⛔ Files ignored due to path filters (9)
python/huggingfaceserver/uv.lockis excluded by!**/*.lockpython/kserve/uv.lockis excluded by!**/*.lockpython/lgbserver/uv.lockis excluded by!**/*.lockpython/paddleserver/uv.lockis excluded by!**/*.lockpython/pmmlserver/uv.lockis excluded by!**/*.lockpython/predictiveserver/uv.lockis excluded by!**/*.lockpython/sklearnserver/uv.lockis excluded by!**/*.lockpython/storage/uv.lockis excluded by!**/*.lockpython/xgbserver/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
config/overlays/test-llmisvc/configmap/inferenceservice.yamlconfig/overlays/test-llmisvc/kustomization.yamlhack/setup/infra/manage.kserve-kustomize.shhack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.shhack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.shhack/setup/quick-install/llmisvc-full-install-with-manifests.shpython/storage/pyproject.tomltest/scripts/gh-actions/update-test-overlays.sh
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/scripts/gh-actions/run-e2e-tests.sh`:
- Around line 52-53: The current JSON patch replaces /spec/containers/0/envFrom
and clobbers existing entries; change the kubectl patch to append the secretRef
instead: use a JSON patch op that adds to the array element path
"/spec/containers/0/envFrom/-" (so it appends
{"secretRef":{"name":"hf-credentials"}}) against the clusterservingruntime
kserve-huggingfaceserver, and add a fallback that creates the envFrom array if
the append fails (e.g. run the append patch and, if it returns non-zero, run a
second patch that adds the whole array at "/spec/containers/0/envFrom").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 9de246d7-8a0d-439a-ab12-45526aba35aa
📒 Files selected for processing (2)
.github/workflows/e2e-test.ymltest/scripts/gh-actions/run-e2e-tests.sh
|
/retest |
|
/test Red Hat Konflux / kserve-agent-on-pull-request |
|
/test pr-image-mirror-kserve-agent |
|
/test "Red Hat Konflux / kserve-agent-on-pull-request" |
|
/test "kserve-agent-on-pull-request" |
|
/test pr-image-mirror-kserve-agent |
116251a to
cc36155
Compare
… issues hf-xet has upstream bugs that cause intermittent model download failures. Pinning hf-xet>=1.5.0 as an explicit dependency in kserve-storage to ensure all packages resolve to a stable version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Andres Llausas <allausas@redhat.com>
cc36155 to
bc11c67
Compare
|
Looks good to me, we just need tests to pass. There are unresolved CI flakiness issues on master |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andresllh, jlost The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
/group-test |
Signed-off-by: Andres Llausas <allausas@redhat.com>
|
/group-test |
1 similar comment
|
/group-test |
|
/override "Red Hat Konflux / kserve-group-test" |
|
@andresllh: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/lgtm |
|
@andresllh: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:kserve-group-test-ftlz7 |
|
/override "Red Hat Konflux / kserve-group-test" |
|
@jlost: Overrode contexts on behalf of jlost: Red Hat Konflux / kserve-group-test DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
b123ae8
into
opendatahub-io:master
Summary
hf-xet>=1.5.0as an explicit dependency inpython/storage/pyproject.tomlhf-xetis pulled in transitively byhuggingface-hubbut was locked at v1.1.3 across all packages due to uv lockfile pinninghf-xetbugs cause intermittent model download failures (deployment never reachesMinimumReplicasAvailable), particularly visible in CI e2e testsuv.lockfiles regenerated withmake precommitUpstream PR
kserve#5510
Test plan
hf-xet==1.5.0resolves in all package environmentstest_llm_inference_servicetests to confirm download stability improves🤖 Generated with Claude Code
Summary by CodeRabbit