Skip to content

fix(storage): update hf-xet to v1.5.0 to resolve HuggingFace download issues#1475

Merged
openshift-merge-bot[bot] merged 2 commits into
opendatahub-io:masterfrom
andresllh:odh/fix/update-xet-dependency
May 19, 2026
Merged

fix(storage): update hf-xet to v1.5.0 to resolve HuggingFace download issues#1475
openshift-merge-bot[bot] merged 2 commits into
opendatahub-io:masterfrom
andresllh:odh/fix/update-xet-dependency

Conversation

@andresllh

@andresllh andresllh commented May 8, 2026

Copy link
Copy Markdown
Member

Summary

  • Pins hf-xet>=1.5.0 as an explicit dependency in python/storage/pyproject.toml
  • hf-xet is pulled in transitively by huggingface-hub but was locked at v1.1.3 across all packages due to uv lockfile pinning
  • Upstream hf-xet bugs cause intermittent model download failures (deployment never reaches MinimumReplicasAvailable), particularly visible in CI e2e tests
  • All uv.lock files regenerated with make precommit

Upstream PR

kserve#5510

Test plan

  • Verify hf-xet==1.5.0 resolves in all package environments
  • Re-run flaky e2e test_llm_inference_service tests to confirm download stability improves

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Updated project dependencies.

@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (2)
  • python/predictiveserver/uv.lock is excluded by !**/*.lock
  • python/sklearnserver/uv.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5175dc22-1ff0-4d0a-94f4-190a17473871

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a single runtime dependency, hf-xet>=1.5.0, to the storage module's pyproject.toml manifest file. No code changes, no version constraint modifications to existing dependencies, and no public API alterations.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding hf-xet>=1.5.0 dependency to resolve HuggingFace download issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/overlays/test-llmisvc/configmap/inferenceservice.yaml`:
- Line 9: The image field currently uses the mutable tag
"kserve/storage-initializer:latest"; change it to a pinned version or
placeholder by replacing the value of the "image" key (the "image" :
"kserve/storage-initializer:latest" entry) with a specific semantic version
(e.g., kserve/storage-initializer:vX.Y.Z) or an explicit placeholder like
{{STORAGE_INITIALIZER_VERSION}} and ensure any deployment templating (e.g., the
SET_KSERVE_VERSION wiring) substitutes that variable so the default is not
":latest".

In `@hack/setup/infra/manage.kserve-kustomize.sh`:
- Around line 191-194: The current block only replaces the literal "latest" tag,
making subsequent runs with a different SET_KSERVE_VERSION a no-op; update the
sed replacement invoked when SET_KSERVE_VERSION is set (the section that logs
via log_info and edits
config/overlays/test-llmisvc/configmap/inferenceservice.yaml) to rewrite the
image tag pattern (replace whatever follows the image name after the colon)
instead of matching the string "latest" so the image tag is idempotently updated
to the new ${SET_KSERVE_VERSION} on every run.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 1657-1659: The script injects raw SET_KSERVE_VERSION into a sed
replacement which allows special chars like '/' or '&' to break the
substitution; fix by validating and escaping the value before use: ensure
SET_KSERVE_VERSION is non-empty and matches an allowed pattern (e.g.,
alphanumerics, dot, dash, underscore) and if it fails, log error and exit, then
build a sanitized variable (escape sed metacharacters such as '/' and '&' or use
a different delimiter) and use that sanitized variable in the sed replacement
instead of the raw $SET_KSERVE_VERSION; also quote variable expansions and keep
the log_info call as-is (reference symbols: SET_KSERVE_VERSION, sed -i -e
"s/latest/.../g", log_info).

In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Around line 1243-1246: The current sed invocation that replaces "latest" (in
the block guarded by SET_KSERVE_VERSION and logged via log_info) is too broad
and can change unrelated fields; update the sed command so it only targets image
tag occurrences (e.g., the "image:" line or a colon-separated tag like
":latest") by using a regex that matches the image field or the pattern
"/...:latest" and replace only that "latest" with ${SET_KSERVE_VERSION}; modify
the sed invocation in the same conditional (the block that calls log_info and
uses SET_KSERVE_VERSION) to use that scoped regex so only image tag values are
changed.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 1576-1578: The sed invocation blindly replaces every "latest" and
injects ${SET_KSERVE_VERSION} unescaped; instead escape the SET_KSERVE_VERSION
value and restrict the substitution to the kserve/storage-initializer image tag
only. Update the block around SET_KSERVE_VERSION to (1) compute an
escaped_version from SET_KSERVE_VERSION (escape slashes and ampersands), and (2)
run sed that matches the image name token (e.g.,
"kserve/storage-initializer:[^\"']*") and replaces only the tag portion with the
escaped_version in config/overlays/test-llmisvc/configmap/inferenceservice.yaml
so arbitrary "latest" occurrences are not mutated and special chars in the
version do not break the replacement.

In `@python/storage/pyproject.toml`:
- Line 17: Replace the unbounded dependency spec "hf-xet>=1.5.0" with a bounded
range to prevent unexpected upgrades; locate the dependency entry
"hf-xet>=1.5.0" in the pyproject.toml and change it to a stable spec such as
"hf-xet~=1.5.0" or "hf-xet>=1.5.0,<2.0.0" so patch/minor updates are allowed but
breaking major releases are excluded.

In `@test/scripts/gh-actions/update-test-overlays.sh`:
- Line 40: The sed replacement injects ${TAG} unescaped which allows characters
like /, &, or \ to break the command; update the script (the sed invocation on
the line with sed -i -e "s/latest/${TAG}/g" targeting
config/overlays/test-llmisvc/configmap/inferenceservice.yaml) to safely handle
arbitrary tag values by first escaping the TAG variable (escape /, &, and
backslashes) or by using a different sed delimiter and wrapping the variable in
quotes, then use the escaped/quoted variable in the sed expression so the
substitution cannot be used for injection or fail on special characters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: dd4af887-1b83-4967-acc6-97778a36f66d

📥 Commits

Reviewing files that changed from the base of the PR and between a02393f and 378e220.

⛔ Files ignored due to path filters (9)
  • python/huggingfaceserver/uv.lock is excluded by !**/*.lock
  • python/kserve/uv.lock is excluded by !**/*.lock
  • python/lgbserver/uv.lock is excluded by !**/*.lock
  • python/paddleserver/uv.lock is excluded by !**/*.lock
  • python/pmmlserver/uv.lock is excluded by !**/*.lock
  • python/predictiveserver/uv.lock is excluded by !**/*.lock
  • python/sklearnserver/uv.lock is excluded by !**/*.lock
  • python/storage/uv.lock is excluded by !**/*.lock
  • python/xgbserver/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • config/overlays/test-llmisvc/configmap/inferenceservice.yaml
  • config/overlays/test-llmisvc/kustomization.yaml
  • hack/setup/infra/manage.kserve-kustomize.sh
  • hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/llmisvc-full-install-with-manifests.sh
  • python/storage/pyproject.toml
  • test/scripts/gh-actions/update-test-overlays.sh

Comment thread config/overlays/test-llmisvc/configmap/inferenceservice.yaml Outdated
Comment thread hack/setup/infra/manage.kserve-kustomize.sh Outdated
Comment thread hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh Outdated
Comment thread hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh Outdated
Comment thread hack/setup/quick-install/llmisvc-full-install-with-manifests.sh Outdated
Comment thread python/storage/pyproject.toml
Comment thread test/scripts/gh-actions/update-test-overlays.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/scripts/gh-actions/run-e2e-tests.sh`:
- Around line 52-53: The current JSON patch replaces /spec/containers/0/envFrom
and clobbers existing entries; change the kubectl patch to append the secretRef
instead: use a JSON patch op that adds to the array element path
"/spec/containers/0/envFrom/-" (so it appends
{"secretRef":{"name":"hf-credentials"}}) against the clusterservingruntime
kserve-huggingfaceserver, and add a fallback that creates the envFrom array if
the append fails (e.g. run the append patch and, if it returns non-zero, run a
second patch that adds the whole array at "/spec/containers/0/envFrom").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9de246d7-8a0d-439a-ab12-45526aba35aa

📥 Commits

Reviewing files that changed from the base of the PR and between 378e220 and 116251a.

📒 Files selected for processing (2)
  • .github/workflows/e2e-test.yml
  • test/scripts/gh-actions/run-e2e-tests.sh

Comment thread test/scripts/gh-actions/run-e2e-tests.sh Outdated
@andresllh

Copy link
Copy Markdown
Member Author

/retest

@andresllh

Copy link
Copy Markdown
Member Author

/test Red Hat Konflux / kserve-agent-on-pull-request

@andresllh

Copy link
Copy Markdown
Member Author

/test pr-image-mirror-kserve-agent

@andresllh

Copy link
Copy Markdown
Member Author

/test "Red Hat Konflux / kserve-agent-on-pull-request"

@andresllh

Copy link
Copy Markdown
Member Author

/test "kserve-agent-on-pull-request"

@andresllh

Copy link
Copy Markdown
Member Author

/test pr-image-mirror-kserve-agent

@andresllh andresllh force-pushed the odh/fix/update-xet-dependency branch from 116251a to cc36155 Compare May 13, 2026 13:26
… issues

hf-xet has upstream bugs that cause intermittent model download failures.
Pinning hf-xet>=1.5.0 as an explicit dependency in kserve-storage to
ensure all packages resolve to a stable version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andres Llausas <allausas@redhat.com>
@andresllh andresllh force-pushed the odh/fix/update-xet-dependency branch from cc36155 to bc11c67 Compare May 18, 2026 13:22
@jlost

jlost commented May 18, 2026

Copy link
Copy Markdown

Looks good to me, we just need tests to pass. There are unresolved CI flakiness issues on master

@openshift-ci

openshift-ci Bot commented May 18, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andresllh, jlost

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlost

jlost commented May 18, 2026

Copy link
Copy Markdown

/lgtm

@andresllh

Copy link
Copy Markdown
Member Author

/group-test

Signed-off-by: Andres Llausas <allausas@redhat.com>
@openshift-ci openshift-ci Bot removed the lgtm label May 18, 2026
@andresllh

Copy link
Copy Markdown
Member Author

/group-test

1 similar comment
@andresllh

Copy link
Copy Markdown
Member Author

/group-test

@andresllh

Copy link
Copy Markdown
Member Author

/override "Red Hat Konflux / kserve-group-test"

@openshift-ci

openshift-ci Bot commented May 19, 2026

Copy link
Copy Markdown

@andresllh: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • Red Hat Konflux / kserve-group-test

Only the following failed contexts/checkruns were expected:

  • CodeRabbit
  • ci/prow/e2e-graph
  • ci/prow/e2e-llm-inference-service
  • ci/prow/e2e-predictor
  • ci/prow/e2e-raw
  • ci/prow/images
  • ci/prow/pr-image-mirror-kserve-agent
  • ci/prow/pr-image-mirror-kserve-controller
  • ci/prow/pr-image-mirror-kserve-router
  • ci/prow/pr-image-mirror-kserve-storage-initializer
  • pull-ci-opendatahub-io-kserve-master-e2e-graph
  • pull-ci-opendatahub-io-kserve-master-e2e-llm-inference-service
  • pull-ci-opendatahub-io-kserve-master-e2e-predictor
  • pull-ci-opendatahub-io-kserve-master-e2e-raw
  • pull-ci-opendatahub-io-kserve-master-images
  • pull-ci-opendatahub-io-kserve-master-pr-image-mirror-kserve-agent
  • pull-ci-opendatahub-io-kserve-master-pr-image-mirror-kserve-controller
  • pull-ci-opendatahub-io-kserve-master-pr-image-mirror-kserve-router
  • pull-ci-opendatahub-io-kserve-master-pr-image-mirror-kserve-storage-initializer
  • push
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

Details

In response to this:

/override "Red Hat Konflux / kserve-group-test"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jlost

jlost commented May 19, 2026

Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm label May 19, 2026
@rhods-ci-bot

Copy link
Copy Markdown

@andresllh: The following test has Failed:

OCI Artifact Browser URL

View in Artifact Browser

Inspecting Test Artifacts Manually

To inspect your test artifacts manually, follow these steps:

  1. Install ORAS (see the ORAS installation guide).
  2. Download artifacts with the following commands:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:kserve-group-test-ftlz7

@jlost

jlost commented May 19, 2026

Copy link
Copy Markdown

/override "Red Hat Konflux / kserve-group-test"

@openshift-ci

openshift-ci Bot commented May 19, 2026

Copy link
Copy Markdown

@jlost: Overrode contexts on behalf of jlost: Red Hat Konflux / kserve-group-test

Details

In response to this:

/override "Red Hat Konflux / kserve-group-test"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit b123ae8 into opendatahub-io:master May 19, 2026
26 of 27 checks passed
@github-project-automation github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning May 19, 2026
@andresllh andresllh deleted the odh/fix/update-xet-dependency branch May 19, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants