Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/hub/_redirects.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ security-two-fa: security-2fa
repositories-recommendations: storage-limits
xet: xet/index
storage-backends: xet/index
git-xet: xet/using-xet-storage#git-xet
3 changes: 2 additions & 1 deletion docs/hub/datasets-downloading.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@ dataset = pd.read_csv(

## Using Git

Since all datasets on the Hub are Git repositories, you can clone the datasets locally by running:
Since all datasets on the Hub are [Xet-backed](./xet/using-xet-storage#git-xet) Git repositories, you can clone the datasets locally by running:

```bash
git xet install
git lfs install
git clone git@hf.co:datasets/<dataset ID> # example: git clone git@hf.co:datasets/allenai/c4
```
Expand Down
2 changes: 1 addition & 1 deletion docs/hub/datasets-libraries.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ This guide will cover three primary ways to upload data to the Hub:
- using the `datasets` library and the `push_to_hub` method
- using `pandas` to write to the Hub
- using the `huggingface_hub` library and the `hf_hub_download` method
- directly using the API or Git LFS
- directly using the API or Git with git-xet

#### Use the `datasets` library

Expand Down
2 changes: 1 addition & 1 deletion docs/hub/models-adding-libraries.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The Hugging Face Hub aims to facilitate sharing machine learning models, checkpo
Integrating the Hub with your library provides many benefits, including:

- Free model hosting for you and your users.
- Built-in file versioning - even for huge files - made possible by [Git-LFS](https://git-lfs.github.com/).
- Built-in file versioning - even for huge files - made possible by [Git-xet](./xet/using-xet-storage#git-xet).
- Community features (discussions, pull requests, likes).
- Usage metrics for all models ran with your library.

Expand Down
3 changes: 2 additions & 1 deletion docs/hub/models-downloading.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,10 @@ model = joblib.load(

## Using Git

Since all models on the Model Hub are Git repositories, you can clone the models locally by running:
Since all models on the Model Hub are [Xet-backed](./xet/using-xet-storage#git-xet) Git repositories, you can clone the models locally by running:

```bash
git xet install
git lfs install
git clone git@hf.co:<MODEL ID> # example: git clone git@hf.co:bigscience/bloom
```
Expand Down
16 changes: 5 additions & 11 deletions docs/hub/repositories-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This beginner-friendly guide will help you get the basic skills you need to crea

This document shows how to handle repositories through the web interface as well as through the terminal. There are no requirements if working with the UI. If you want to work with the terminal, please follow these installation instructions.

If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git LFS](https://git-lfs.github.com/), which will be used to handle large files such as images and model weights.
If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git-xet](./xet/using-xet-storage#git-xet), which will be used to handle large files such as images and model weights.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You will also need to install Git LFS and to install Git-xet, which will be used to handle large files such as images and model weights."

We still need git-lfs installed along with git-xet.


> [!TIP]
> For improved upload and download speeds when working with large files and Git, install the [Git Xet](xet/using-xet-storage#git) extension.
Expand Down Expand Up @@ -110,19 +110,13 @@ You'll need to add your SSH public key to [your user settings](https://huggingfa

Now's the time, you can add any files you want to the repository! 🔥

Do you have files larger than 10MB? Those files should be tracked with `git-lfs`, which you can initialize with:
Do you have files larger than 10MB? Those files should be tracked with [`git-xet`](./xet/using-xet-storage#git-xet), which you can initialize with:

```bash
git lfs install
git xet install
```

Note that if your files are larger than **5GB** you'll also need to run:

```bash
hf lfs-enable-largefiles .
```

When you use Hugging Face to create a repository, Hugging Face automatically provides a list of common file extensions for common Machine Learning large files in the `.gitattributes` file, which `git-lfs` uses to efficiently track changes to your large files. However, you might need to add new extensions if your file types are not already handled. You can do so with `git lfs track "*.your_extension"`.
When you use Hugging Face to create a repository, Hugging Face automatically provides a list of common file extensions for common Machine Learning large files in the `.gitattributes` file, which `git-xet` uses to efficiently track changes to your large files. However, you might need to add new extensions if your file types are not already handled. You can do so with `git xet track "*.your_extension"`.

### Pushing files

Expand All @@ -135,7 +129,7 @@ git commit -m "First model version" # You can choose any descriptive message
git push
```

And you're done! You can check your repository on Hugging Face with all the recently added files. For example, in the screenshot below the user added a number of files. Note that some files in this example have a size of `1.04 GB`, so the repo uses Git LFS to track it.
And you're done! You can check your repository on Hugging Face with all the recently added files. For example, in the screenshot below the user added a number of files. Note that some files in this example have a size of `1.04 GB`, so the repo uses Xet to track it.

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/repo_with_files.png"/>
Expand Down
5 changes: 5 additions & 0 deletions docs/hub/xet/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ Storing these files directly in a pure Git repository is impractical. Not only a

Instead, on the Hub, these large files are tracked using "pointer files" and identified through a `.gitattributes` file (both discussed in more detail below), which remain in the Git repository while the actual data is stored in remote storage (like [Amazon S3](https://aws.amazon.com/s3/)). As a result, the repository stays small and typical Git workflows remain efficient.

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/xet-speed.gif"/>
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/xet-speed-dark.gif"/>
</div>

Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see [Backwards Compatibility & Legacy](./legacy-git-lfs)), the Hub has adopted Xet, a modern custom storage system built specifically for AI/ML development. It enables chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.

## Open Source Xet Protocol
Expand Down
52 changes: 27 additions & 25 deletions docs/hub/xet/using-xet-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ To see more detailed usage docs, refer to the `huggingface_hub` docs for:
- [Managing the `hf_xet` cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#chunk-based-caching-xet)

## Git

<a id="git-xet"></a>
Git users can access the benefits of Xet by downloading and installing the Git Xet extension. Once installed, simply use the [standard workflows for managing Hub repositories with Git](../repositories-getting-started) - no additional changes necessary.

### Prerequisites
Expand All @@ -36,21 +36,23 @@ Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/).

### Install on macOS or Linux (amd64 or aarch64)

Install using an installation script with the following command in your terminal (requires `curl` and `unzip`):
```
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
```
Or, install using [Homebrew](https://brew.sh/), with the following [tap](https://docs.brew.sh/Taps) (direct `brew install` coming soon):
```
brew tap huggingface/tap
brew install git-xet
git-xet install
```

To verify the installation, run:
```
git-xet --version
```
Install using an installation script with the following command in your terminal (requires `curl` and `unzip`):
```
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
```

Or, install using [Homebrew](https://brew.sh/), with the following [tap](https://docs.brew.sh/Taps) (direct `brew install` coming soon):
```
brew tap huggingface/tap
brew install git-xet
git xet install
```

To verify the installation, run:
```
git-xet --version
```

### Windows (amd64)

Using an installer:
Expand All @@ -60,7 +62,7 @@ Using an installer:
Manual installation:
- Download `git-xet-windows-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-x86_64.zip)) and unzip.
- Place the extracted `git-xet.exe` under a `PATH` directory.
- Run `git-xet install` in a terminal.
- Run `git xet install` in a terminal.

To verify the installation, run:
```
Expand All @@ -84,15 +86,15 @@ Under the hood, the [Xet protocol](https://huggingface.co/docs/xet/index) is inv
### Uninstall on macOS or Linux

Using Homebrew:
```
git-xet uninstall
brew uninstall git-xet
```
```bash
git-xet uninstall
brew uninstall git-xet
```
If you used the installation script (for MacOS or Linux), run the following in your terminal:
```
git-xet uninstall
sudo rm $(which git-xet)
```
```bash
git-xet uninstall
sudo rm $(which git-xet)
```
### Uninstall on Windows

If you used the installer:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ Create your own `model.tar.gz` from a model from the 🤗 Hub:
1. Download a model:

```bash
git lfs install
git xet install
git clone git@hf.co:{repository}
```

Expand Down