Optimize sharded safetensors metadata parsing: 3 HTTP requests → 2 per shard by mishig25 · Pull Request #1979 · huggingface/huggingface.js

mishig25 · 2026-02-16T11:49:08Z

Summary

Bypass downloadFile/fileDownloadInfo when fetching shard headers, using direct range requests instead
Reduces HTTP round-trips from 3 to 2 per shard (8 bytes for header length, then exact header content)
Rejects non-206 responses to avoid downloading full multi-GB shard bodies into memory

Benchmarks (avg of 10 runs each)

Model (shards)	Old (3 req/shard)	Optimized (2 req/shard)	Change
bloom (72)	3,078ms	4,696ms*	*outlier skew
sharded file path (4)	996ms	1,125ms	~same
sharded metadata	2,539ms	2,179ms	14% faster
gpt-oss-20b (3)	1,058ms	1,079ms	~same
Kimi-K2.5 (64)	3,602ms	2,385ms	34% faster
DeepSeek-Math-V2 (163)	5,006ms FAILED 10/10	3,759ms	Fixed
Qwen3.5-397B (94)	2,790ms (1/10 fail)	2,058ms	26% faster

* bloom optimized avg skewed by one 10.8s outlier run (min was 1,581ms vs old min 2,967ms)

Key finding: DeepSeek-Math-V2 (163 shards) fails 100% with old code — 489 HTTP requests (163 × 3) overwhelm the server. The optimized path (163 × 2 = 326 requests) handles it reliably.

Test plan

All 17 existing + new tests pass (10/10 runs)
Verify with private/gated repos (auth header passthrough)

🤖 Generated with Claude Code

Previously, each shard required 3 HTTP requests (fileDownloadInfo + header length + header content). This replaces that with 2 direct range requests (8 bytes for length, then exact header bytes), bypassing downloadFile/ fileDownloadInfo entirely for sharded models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

packages/hub/src/lib/parse-safetensors-metadata.ts

…ll shard bodies Cancel the response body and throw a clear error when the server returns 200 instead of 206, which would otherwise attempt to load multi-GB shard files into memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coyotte508

note this means that we're downloading from the xet bridge, instead of downloading from xet backend directly (xet-read-token + reconstructionInfo + cas requests)

maybe faster because of the CDN in front of the bridge? or the bridge being closer to the xet backend? idk

anyway up to you (whether to merge) - cc @XciD for fiz

mishig25 requested a review from coyotte508 as a code owner February 16, 2026 11:49

cursor bot reviewed Feb 16, 2026

View reviewed changes

packages/hub/src/lib/parse-safetensors-metadata.ts Show resolved Hide resolved

mishig25 requested a review from gary149 February 16, 2026 11:54

coyotte508 reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sharded safetensors metadata parsing: 3 HTTP requests → 2 per shard#1979

Optimize sharded safetensors metadata parsing: 3 HTTP requests → 2 per shard#1979
mishig25 wants to merge 2 commits intomainfrom
optimize-sharded-safetensors-fetch

mishig25 commented Feb 16, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

coyotte508 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mishig25 commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks (avg of 10 runs each)

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coyotte508 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mishig25 commented Feb 16, 2026 •

edited

Loading

coyotte508 left a comment •

edited

Loading