Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/hub/mlx.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ response = generate(model, tokenizer, prompt="hello", verbose=True)

MLX-LM supports popular LLM architectures including LLaMA, Phi-2, Mistral, and Qwen. Models other than supported ones can easily be downloaded as follows:

Setting `HF_XET_HIGH_PERFORMANCE=1` enables higher concurrency bounds and larger buffers for faster downloads on capable hardware:

```py
pip install -U huggingface_hub

Expand Down
5 changes: 3 additions & 2 deletions docs/hub/models-downloading.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,9 @@ Add your SSH public key to [your user settings](https://huggingface.co/settings/

## Faster downloads

If you are running on a machine with high bandwidth,
you can speed up downloads by allowing `hf_xet` to run on all CPU cores. `hf_xet` is a Rust-based package leveraging the new [Xet storage backend](https://huggingface.co/docs/hub/en/xet/index) to optimize file transfers with chunk-based deduplication. `hf_xet` is enabled by default but with lower performances to avoid bloating available CPU and bandwidth, which could degrade UX.
`hf_xet` is a Rust-based package leveraging the [Xet storage backend](https://huggingface.co/docs/hub/en/xet/index) to optimize file transfers with chunk-based deduplication. By default, `hf_xet` uses **adaptive concurrency** — it automatically tunes the number of parallel transfer streams based on real-time network conditions, starting conservatively (1 stream) and scaling up to 64 concurrent streams as bandwidth permits.

If you are running on a machine with high bandwidth, set `HF_XET_HIGH_PERFORMANCE=1` to raise the concurrency bounds: it starts at 16 streams instead of 1, allows up to 124 concurrent streams, and increases download buffer sizes. This is recommended for high-bandwidth machines or data center environments.

```bash
pip install -U huggingface_hub
Expand Down
69 changes: 68 additions & 1 deletion docs/hub/xet/using-xet-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,14 +123,81 @@ Xet integrates seamlessly with all of the Hub's workflows. However, there are a
When uploading or downloading with Python:

- **Make sure `hf_xet` is installed**: While Xet remains backward compatible with legacy clients optimized for Git LFS, the `hf_xet` integration with `huggingface_hub` delivers optimal chunk-based performance and faster iteration on large files.
- **Utilize `hf_xet` environment variables**: The default installation of `hf_xet` is designed to support the broadest range of hardware. To take advantage of setups with more network bandwidth or processing power read up on `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) to optimize downloads and uploads.
- **Adaptive concurrency is on by default**: `hf_xet` automatically adjusts the number of parallel transfer streams based on real-time network conditions — no configuration required. For high-bandwidth machines, set `HF_XET_HIGH_PERFORMANCE=1` to raise the concurrency bounds and buffer sizes for maximum throughput.
- **Advanced tuning**: For fine-grained control, `HF_XET_FIXED_DOWNLOAD_CONCURRENCY` and `HF_XET_FIXED_UPLOAD_CONCURRENCY` let you pin concurrency to a fixed value, bypassing the adaptive controller. See `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) for the full list of options.

When uploading or downloading in Git or Python:

- **Leverage frequent, incremental commits**: Xet's chunk-level deduplication means you can safely make incremental updates to models or datasets. Only changed chunks are uploaded, so frequent commits are both fast and storage-efficient.
- **Be Specific in .gitattributes**: When defining patterns for Xet or LFS, use precise file extensions (e.g., `*.safetensors`, `*.bin`) to avoid unnecessarily routing smaller files through large-file storage.
- **Prioritize community access**: Xet substantially increases the efficiency and scale of large file transfers. Instead of structuring your repository to reduce its total size (or the size of individual files), organize it for collaborators and community users so they may easily navigate and retrieve the content they need.

## Environment Variables

Both `hf_xet` and Git Xet are powered by `xet-core`, which can be configured via environment variables. The most common variable is `HF_XET_HIGH_PERFORMANCE=1`, which adjusts several settings at once for high-bandwidth machines. The tables below list the individual variables for fine-grained control.

### Adaptive Concurrency

By default, `xet-core` uses adaptive concurrency — dynamically adjusting parallelism based on real-time network conditions. These variables control the adaptive controller's behavior:

| Environment Variable | Default | Description |
|---|---|---|
| `HF_XET_CLIENT_ENABLE_ADAPTIVE_CONCURRENCY` | `true` | Enable or disable adaptive concurrency control. When disabled, concurrency stays at the initial value. |
| `HF_XET_CLIENT_AC_INITIAL_UPLOAD_CONCURRENCY` | `1` | Starting number of concurrent upload streams. HP mode: `16`. |
| `HF_XET_CLIENT_AC_INITIAL_DOWNLOAD_CONCURRENCY` | `1` | Starting number of concurrent download streams. HP mode: `16`. |
| `HF_XET_CLIENT_AC_MIN_UPLOAD_CONCURRENCY` | `1` | Lower bound for upload concurrency. HP mode: `4`. |
| `HF_XET_CLIENT_AC_MIN_DOWNLOAD_CONCURRENCY` | `1` | Lower bound for download concurrency. HP mode: `4`. |
| `HF_XET_CLIENT_AC_MAX_UPLOAD_CONCURRENCY` | `64` | Upper bound for upload concurrency. HP mode: `124`. |
| `HF_XET_CLIENT_AC_MAX_DOWNLOAD_CONCURRENCY` | `64` | Upper bound for download concurrency. HP mode: `124`. |
| `HF_XET_CLIENT_AC_TARGET_RTT` | `60s` | Target round-trip time. Concurrency increases when RTT is below this value. |
| `HF_XET_CLIENT_AC_HEALTHY_SUCCESS_RATIO_THRESHOLD` | `0.8` | Success ratio above which the controller increases concurrency. |
| `HF_XET_CLIENT_AC_UNHEALTHY_SUCCESS_RATIO_THRESHOLD` | `0.5` | Success ratio below which the controller decreases concurrency. |
| `HF_XET_CLIENT_AC_LOGGING_INTERVAL_MS` | `10000` | Interval (in ms) at which concurrency status is logged. |

> [!TIP]
> To pin concurrency to a fixed value (bypassing the adaptive controller), use the convenience aliases `HF_XET_FIXED_UPLOAD_CONCURRENCY` and `HF_XET_FIXED_DOWNLOAD_CONCURRENCY`. These set the initial, minimum, and maximum concurrency to the same value.

### Network and Retry

| Environment Variable | Default | Description |
|---|---|---|
| `HF_XET_CLIENT_RETRY_MAX_ATTEMPTS` | `5` | Maximum number of retry attempts for failed requests. |
| `HF_XET_CLIENT_RETRY_BASE_DELAY` | `3000ms` | Base delay between retries (with exponential backoff). |
| `HF_XET_CLIENT_RETRY_MAX_DURATION` | `360s` | Maximum total time to spend retrying a request. |
| `HF_XET_CLIENT_CONNECT_TIMEOUT` | `60s` | TCP connection timeout. |
| `HF_XET_CLIENT_READ_TIMEOUT` | `120s` | Read timeout for HTTP responses. |
| `HF_XET_CLIENT_IDLE_CONNECTION_TIMEOUT` | `60s` | Timeout before idle connections are closed. |
| `HF_XET_CLIENT_MAX_IDLE_CONNECTIONS` | `16` | Maximum number of idle connections in the pool. |

### Data Transfer

| Environment Variable | Default | Description |
|---|---|---|
| `HF_XET_DATA_MAX_CONCURRENT_FILE_INGESTION` | `8` | Maximum number of files processed concurrently during upload. HP mode: `100`. |
| `HF_XET_DATA_MAX_CONCURRENT_FILE_DOWNLOADS` | `8` | Maximum number of files downloaded concurrently. |
| `HF_XET_DATA_INGESTION_BLOCK_SIZE` | `8mb` | Size of blocks read during file ingestion. |
| `HF_XET_DATA_PROGRESS_UPDATE_INTERVAL` | `200ms` | How often progress bars are updated. |

### Download Buffers

These control memory usage during downloads. `HF_XET_HIGH_PERFORMANCE=1` raises these significantly.

| Environment Variable | Default | HP Mode | Description |
|---|---|---|---|
| `HF_XET_RECONSTRUCTION_MIN_RECONSTRUCTION_FETCH_SIZE` | `256mb` | `1gb` | Minimum fetch size for reconstruction requests. |
| `HF_XET_RECONSTRUCTION_MAX_RECONSTRUCTION_FETCH_SIZE` | `8gb` | `16gb` | Maximum fetch size for reconstruction requests. |
| `HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_SIZE` | `2gb` | `16gb` | Total download buffer size. |
| `HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_PERFILE_SIZE` | `512mb` | `2gb` | Per-file download buffer size. |
| `HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_LIMIT` | `8gb` | `64gb` | Hard limit on total download buffer memory. |

### Logging

| Environment Variable | Default | Description |
|---|---|---|
| `HF_XET_LOG_DEST` | (none) | Log destination (e.g. a file path). When unset, logs go to stderr. |
| `HF_XET_LOG_FORMAT` | (none) | Log format. |
| `HF_XET_LOG_PREFIX` | `xet` | Prefix for log messages. |

## Current Limitations

While Xet brings fine-grained deduplication and enhanced performance to Git-based storage, some features and platform compatibilities are still in development. As a result, keep the following constraints in mind when working with a Xet-enabled repository:
Expand Down
1 change: 1 addition & 0 deletions docs/xet/download-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ Not specifying this header will result in an authorization failure.
Consider downloading such content only once and reusing the data.
- **Parallel downloads**: Terms can be downloaded in parallel, but MUST be assembled in order
- On file systems with fast seeking, it MAY be advantageous to open the output file in different threads and writing contents at different offsets
- The reference implementation (`xet-core`) uses adaptive concurrency to dynamically adjust the number of concurrent download streams based on network health, scaling up when bandwidth permits and backing off under congestion
- **Caching**: Clients SHOULD consider caching downloaded xorb ranges to avoid redundant requests
- **Retry logic**: Implement exponential backoff for transient failures

Expand Down
2 changes: 2 additions & 0 deletions docs/xet/upload-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,8 @@ However there is one additional enforced requirement about ordering: **all xorbs
If any xorb referenced by a shard is not already uploaded when the shard upload API is called, the server will reject the request.
All xorbs whose hash is used as an entry in the cas info section and in data entries of the file info section are considered "referenced" by a shard.

The reference implementation (`xet-core`) uses adaptive concurrency to dynamically adjust the number of concurrent xorb uploads based on network conditions. It starts conservatively and scales up when bandwidth permits, automatically backing off under congestion. This allows uploads to achieve high throughput on fast connections while remaining well-behaved on constrained networks.

## Integrity and Idempotency

- Hashing of chunks, xorbs, and shards ensures integrity and enables deduplication across local and global scopes. See: [hashing](./hashing).
Expand Down