Releases: huggingface/xet-core
[git-xet v0.2.0] Better Windows support, new command and performance improvement
- Extends support to Windows platform and SSH remote URL, marking all three major platforms supported with both HTTP and SSH remote URL.
- Add “git xet track” command to replace “git lfs track” to unify branding.
- Upload performance improvement: avoid re-computation of SHA256 in git-xet and uses the value passed in from git-lfs.
Full Changelog: git-xet-v0.1.0...git-xet-v0.2.0
[hf-xet v1.2.0] New logging system, Free-threaded Python, Performance Improvements
✨ New Features and Improvements
- New file-based logging system to support enhanced diagnostics and debugging (by @hoytak in #502)
- SOCKS5 Proxy support (by @SuperKenVery in #474)
- Support for Free-threaded Python 3.13 and 3.14 (by @rajatarya @seanses in #524)
- Improved performance by disabling disk-based chunk cache by default (by @rajatarya in #535)
- Updated rust edition to 2024, upgrade rustc to 1.89 (by @seanses in #494)
🐛 Bug Fixes and Enhancements
- Improved documentation for some crates (by @assafvayner in #492)
- Improved user-configurable constant handling (by @hoytak in #493)
- Bug fixes to diagnostic scripts and build workflows
- Remove Xet protocol docs as they live elsewhere now (see https://huggingface.co/docs/xet/index)
What's Changed
- Adding README to few crates for documentation by @assafvayner in #492
- Rename Threadpool class name to XetRuntime to reflect usage by @hoytak in #491
- Improved user-configurable constant handling by @hoytak in #493
- upgrade rust edition to 2024; upgrade rustc to 1.89 by @seanses in #494
- MacOS diag scripts by @rajatarya in #497
- Allow Duration and byte sizes in constants for easier use. by @hoytak in #495
- Build and release git-xet by @seanses in #499
- Fix git xet release bug by @seanses in #504
- Better support "xet-write-token" API authorization model and LFS Batch Api change by @seanses in #498
- Fix git-xet release bug by @seanses in #505
- git-xet install script for Linux & macOS by @seanses in #503
- Convert status code to error for get_cas_jwt by @seanses in #509
- Cache rust build in actions by @seanses in #513
- hashing and chunking example tools by @assafvayner in #496
- Enable socks5 proxy support by @SuperKenVery in #474
- spec draft by @assafvayner in #422
- Added lazy evaluation functionality to error printer. by @hoytak in #510
- move spec to docs by @assafvayner in #515
- integrate docs debugging by @assafvayner in #516
- rm all docs files by @assafvayner in #517
- git-xet Windows installer and code signing by @seanses in #519
- Fix python 314 compat by @Qubitium in #520
- Upgrade macos-13 to macos-15-intel due to closing down by @seanses in #521
- openapi spec and Makefile for it by @assafvayner in #518
- fix: explicitly specify main branch for hub client in migration utility by @sirahd in #522
- Create README.md for Git-Xet by @seanses in #529
- Logging to directory + log file management; default to log directory for hf_xet by @hoytak in #502
- Fix breaking build changes due to git safety checks. by @hoytak in #530
- Adding python-version 3.13t and 3.14t to builds by @rajatarya in #524
- Add fallback if unable to get git commit by @rajatarya in #531
- Fix typos by @omahs in #508
- Fix clippy issues with new rust version. by @hoytak in #533
- Disable DiskCache in hf_xet, continue to use it in git_xet by @rajatarya in #535
- Improved logging for cas_client crate by @hoytak in #537
- Test suite for directory logging functionality by @hoytak in #536
- fix: small typo by @crStiv in #534
- version bump to 1.2.0 for release by @rajatarya in #539
- Update Python classifiers by @rajatarya in #540
New Contributors
- @SuperKenVery made their first contribution in #474
- @Qubitium made their first contribution in #520
- @omahs made their first contribution in #508
- @crStiv made their first contribution in #534
Full Changelog: v1.1.10...v1.2.0
[git-xet v0.1.0] Git-Xet: "git push" with Xet protocol
Git-Xet is a Git LFS custom transfer agent that implements upload and download of files using the Xet protocol. Follow your regular workflow to git lfs track ... & git add ... & git commit ... & git push and your files are uploaded to Hugging Face repos automatically using the Xet protocol. Enjoy the dedupe!
Installation
- Make sure you have git and git-lfs installed and configured correctly.
- For Linux (amd64 & aarch64) and macOS (amd64 & aarch64), run the following in your terminal to install and configure git-xet (requires curl and unzip):
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
- For Windows (amd64), either
- download
git-xet-windows-installer-x86_64.zipand run the msi file after unzip, or - download
git-xet-windows-x86_64.zipand placegit-xet.exeunder aPATHdirectory, and rungit-xet installin a terminal.
How It Works
Git-Xet works by registering itself as a custom transfer agent to Git LFS by name "xet". On "git push", "git fetch" or "git pull", git-lfs negotiates with the remote server to determine the transfer agent to use. During this process, git-lfs sends to the server all locally registered agent names in the Batch API request, and the server replies with exactly one agent name in the response. Should "xet" be picked, git-lfs delegates the uploading or downloading operation to Git-Xet through a sequential protocol.
For more details regarding Git LFS custom transfer agent protocol, see https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md and https://github.com/git-lfs/git-lfs/blob/main/docs/custom-transfers.md.
[v1.1.10] Bug Fixes and diagnostic tooling
🔧 Improvements & Tools:
- Comprehensive Diagnostic Scripts - New debugging tools for Linux and Windows
- Network Reliability Enhancements - Better retry logic for I/O errors
- Simplified DNS resolution to run in Kubernetes environments
- CAS API Path Modernization - Updated to use plural nouns following REST conventions
🐛 Bug Fixes:
- Chunker Boundary Triggering Fix - Fixed deduplication consistency issues
- WASM First Chunk Dedup Handling - Improved client-side control
- Data Type Safety Enhancements - Standardized on u64 for cross-platform compatibility
What's Changed
- Add input params to Run name in GH Workflow UI by @rajatarya in #478
- Thin wasm: do not automatically set is_dedup to true for first chunk by @coyotte508 in #481
- update api paths to use plural nouns by @assafvayner in #482
- Rename xet_threadpool to xet_runtime to reflect usage by @hoytak in #484
- use u64 rather than usize in file hashing paths by @assafvayner in #485
- Git-Xet: LFS custom transfer agent with Xet protocol by @seanses in #425
- Drop "GaiResolverWithAbsolute" by @seanses in #486
- Fix wheel upload for linux for dev/alpha/beta tags by @hoytak in #379
- Adding retry for unhandled io errors when sending requests by @jgodlew in #468
- Updated chunker to eliminate spurious boundary triggering. by @hoytak in #487
- Diagnostic Scripts + README changes by @rajatarya in #489
- hf_xet 1.1.10 by @assafvayner in #490
Full Changelog: v1.1.9...v1.1.10
[v1.1.9] Bug Fixes: Parallelism optimizations, metadata updates
🚀 Performance Improvements:
• Improve parallelism in parutils by removing async_scoped
• Increase soft file limits for MacOS
🐛 Bug Fixes:
• Update hf_xet PyPI metadata
🔧 Reliability & Maintenance:
• Improved debuggability with tokio console support
• Add CI builds for MacOS
What's Changed
- parutils makeover remove async_scoped by @assafvayner in #454
- tokio console setup by @assafvayner in #458
- enforce linting on hf_xet by @assafvayner in #462
- Raise soft file handle limits to hard limits on OSX. by @hoytak in #453
- run_and_extract_custom: remove use of explicit tokio_retry without utility by @assafvayner in #460
- Use a valid SPDX identifier as license classifier by @ecederstrand in #464
- CI test on macos by @seanses in #473
- Update PyPI package metadata for
hf-xetby @rajatarya in #472 - Update hf_xet/README.md for hf_xet project by @rajatarya in #475
- Bumping version to 1.1.9 by @rajatarya in #476
New Contributors
- @ecederstrand made their first contribution in #464
Full Changelog: v1.1.8...v1.1.9
v1.1.8 Bug Fixes
🚀 Performance Improvements:
• Client Caching - Reuses reqwest Client across RemoteClient objects to share connection pools
• Connection Limits - Limits idle connections to prevent resource exhaustion
🐛 Bug Fixes:
• Singleflight Fix - Critical fix preventing permanent error caching when owner tasks are dropped
• DataHash Serialization - Ensures consistent little-endian byte order across platforms
🔧 Reliability & Maintenance:
• Retry Logic Restoration - Restores retry logic accidentally removed in versions 1.1.6 and 1.1.7
What's Changed
- fix: singleflight owner task not removing Call from Group if dropped by @jgodlew in #447
- Add back retry for connection setup and sending request by @seanses in #455
- Fix DataHash hex string serde to little endian by @seanses in #445
- Clean up dependencies (no functionality change) by @seanses in #456
- Cache and reuse reqwest Client by @seanses in #457
- Limit number of idle connections by @hoytak in #459
- update version by @assafvayner in #461
Full Changelog: v1.1.7...v1.1.8
v1.1.7
[v1.1.6] Bug Fixes: Proxy support, process safety, and more
✨ New Features and Improvements
- Proxy support, easing use behind corporate networks. (#413 by @hoytak; addresses #400 - thanks @albertodepaola and @goodsonjr for the initial reports)
- Improvements to
hf_xetlogging; providing facility to log events to a formatted file (#428 by @hoytak)
🐛 Bug Fixes
- Process safety: make running after
os.fork()safer. (#429 by @hoytak; addresses #415 - thanks @John6666cat for the report) - Respect XDG_CACHE_HOME and ~/ when setting cache directories. (#426 by @hoytak; addresses #417 - thanks @half-duplex for the initial report)
- Lower the default
NUM_RANGE_CONCURRENT_GETSvalue to 64 to better respect file descriptor limits (#438 by @assafvayner; addresses #436 - thanks @djholt and @gary149 for the reports) - JWT token handling hardened with a buffer before expiration. (#405 by @jgodlew; addresses #404)
What's Changed
- Streaming shard interface updates by @assafvayner in #392
- WASM poc by @seanses in #272
- Generic retry wrapper to consolidate and streamline retry logic. by @hoytak in #397
- Fix for retry failure due to non-clonability by @hoytak in #402
- Adding buffer to JWT token expiration check by @jgodlew in #405
- Updating chunk and shard cache default sizes by @jsulz in #406
- Simplified Client interface. by @hoytak in #408
- Add correctness tests for aggregate hash functions. by @hoytak in #412
- Enabling proxy support for reqwest by @hoytak in #413
- Thin wasm by @assafvayner in #411
- Move MDB v1 to reference test code; add standalone hash functions by @hoytak in #414
- Add verification hash and file hash functions by @assafvayner in #416
- Use v1 api paths by @assafvayner in #421
- Set shard size limit as max, not target min by @assafvayner in #420
- Remove footer from upload shard payload by @assafvayner in #419
- Errors on shard reading are now logged and ignored. by @hoytak in #424
- Add whether chunk should be checked against global dedup by @coyotte508 in #423
- Logging improvements by @hoytak in #428
- Export hmac function in thin wasm by @coyotte508 in #427
- Make hf_xet fork-exec safe by @hoytak in #429
- Revert use of v1 api paths by @assafvayner in #432
- Limit number of async worker threads on large CPUs by @hoytak in #431
- Respect XDG_CACHE_HOME and ~/ when setting cache directory. by @hoytak in #426
- Associate static semaphores with runtime by @hoytak in #433
- Remove logging from wasm lib by @coyotte508 in #434
Full Changelog: v1.1.5...v1.1.6
[v1.1.5] Bug Fixes: Cert issue fixes & optimizations
This release includes a fix for certificate issues in certain network environments and loading optimizations for dedup lookups.
🧱 Improvements
- Background shard loading (#384): Loads shard lookup tables in the background to reduce
upload_filesstartup time. Author: Hoyt Koepke
🐛 Bug Fixes
- Certificate loading (#393): Switched to
load_native_certs()for efficiency. Author: Hoyt Koepke
What's Changed
- Shard interface updates by @assafvayner in #382
- Background loading for shards by @hoytak in #384
- fix MDBFileInfo::deserialize_async in case of no verification entries by @assafvayner in #388
- Switch cert loading to use load_native_certs(); by @hoytak in #393
- Cargo.toml+lock version update by @rajatarya in #395
Full Changelog: v1.1.4...v1.1.5
[v1.1.4] Bug Fixes: Network Resilience and Performance Optimizations
📶 DNS Resolution & Network Connectivity
- Fixed DNS resolution issues: Implemented custom DNS resolver to force absolute DNS name resolution, addressing issues where DNS resolvers struggled with CAS server addresses and fell back to local search domains
- Enhanced TLS configuration: Updated reqwest to use rustls-tls by default with configurable TLS backends (native-tls, native-tls-vendored options available)
🚀 Performance Optimizations
- Global download concurrency control: Changed download currency limit from per-file to global (default: 128 simultaneous connections) to prevent file handle exhaustion on macOS
- Optimized chunking operations: Converted core Chunk data type from
Arc<[u8]>tobytes::Bytesfor better memory efficiency and reduced copying. Separated boundary calculation logic from chunk building for future optimization work - Updated shard cache size: Increased default shard cache limit to 16GB, effectively allowing deduplication against 16TB of data
- Streamlined upload payload: Removed footer serialization from upload xorb payload in remote_client for improved efficiency
🤗 Developer Experience
- Issue templates: Added comprehensive GitHub issue templates including bug report forms, feature request forms, and helpful links for better community engagement
What's Changed
- Update chunker to separate out calculation of next boundary by @hoytak in #368
- remove footer serialized from upload xorb payload on remote_client by @assafvayner in #372
- Adding issue templates to repo by @jsulz in #374
- Small optimizations for chunking / upload path by @hoytak in #371
- Switch reqwest to rustls-tls from default; use hickory-dns for dns resolution. by @hoytak in #378
- add ci steps to check cargo.lock is up to date by @assafvayner in #377
- Update shard cache default size. by @hoytak in #381
- Remove hickory-dns and use system dns provider by @hoytak in #380
- Fix/dns resolution by @Hugoch in #383
- Change download currency limit from local to global. by @hoytak in #385
- hf_xet Cargo.toml 1.1.4 by @assafvayner in #387
New Contributors
Full Changelog: v1.1.3...v1.1.4