diff --git a/docs/hub/_redirects.yml b/docs/hub/_redirects.yml index 583c0d6c98..3e8a728e29 100644 --- a/docs/hub/_redirects.yml +++ b/docs/hub/_redirects.yml @@ -18,6 +18,7 @@ api-webhook: webhooks adapter-transformers: adapters security-two-fa: security-2fa repositories-recommendations: storage-limits +datasets-viewer: data-studio xet: xet/index storage-backends: xet/index -datasets-viewer: data-studio +git-xet: xet/using-xet-storage#git-xet diff --git a/docs/hub/datasets-downloading.md b/docs/hub/datasets-downloading.md index 17989d1dec..48177c6a9e 100644 --- a/docs/hub/datasets-downloading.md +++ b/docs/hub/datasets-downloading.md @@ -39,9 +39,10 @@ dataset = pd.read_csv( ## Using Git -Since all datasets on the Hub are Git repositories, you can clone the datasets locally by running: +Since all datasets on the Hub are Xet-backed Git repositories, you can clone the datasets locally by [installing git-xet](./xet/using-xet-storage#git-xet) and running: ```bash +git xet install git lfs install git clone git@hf.co:datasets/ # example: git clone git@hf.co:datasets/allenai/c4 ``` diff --git a/docs/hub/datasets-libraries.md b/docs/hub/datasets-libraries.md index 8610aa05d4..2032dc15b8 100644 --- a/docs/hub/datasets-libraries.md +++ b/docs/hub/datasets-libraries.md @@ -62,7 +62,7 @@ This guide will cover three primary ways to upload data to the Hub: - using the `datasets` library and the `push_to_hub` method - using `pandas` to write to the Hub - using the `huggingface_hub` library and the `hf_hub_download` method -- directly using the API or Git LFS +- directly using the API or Git with git-xet #### Use the `datasets` library diff --git a/docs/hub/models-adding-libraries.md b/docs/hub/models-adding-libraries.md index e99bcded6b..ad410a2fca 100644 --- a/docs/hub/models-adding-libraries.md +++ b/docs/hub/models-adding-libraries.md @@ -5,7 +5,7 @@ The Hugging Face Hub aims to facilitate sharing machine learning models, checkpo Integrating the Hub with your library provides many benefits, including: - Free model hosting for you and your users. -- Built-in file versioning - even for huge files - made possible by [Git-LFS](https://git-lfs.github.com/). +- Built-in file versioning - even for huge files - made possible by [Git-Xet](./xet/using-xet-storage#git-xet). - Community features (discussions, pull requests, likes). - Usage metrics for all models ran with your library. diff --git a/docs/hub/models-downloading.md b/docs/hub/models-downloading.md index b798d1b8a3..603ba628fc 100644 --- a/docs/hub/models-downloading.md +++ b/docs/hub/models-downloading.md @@ -37,9 +37,10 @@ model = joblib.load( ## Using Git -Since all models on the Model Hub are Git repositories, you can clone the models locally by running: +Since all models on the Model Hub are Xet-backed Git repositories, you can clone the models locally by [installing git-xet](./xet/using-xet-storage#git-xet) and running: ```bash +git xet install git lfs install git clone git@hf.co: # example: git clone git@hf.co:bigscience/bloom ``` diff --git a/docs/hub/repositories-getting-started.md b/docs/hub/repositories-getting-started.md index 735b7a67eb..131588c29f 100644 --- a/docs/hub/repositories-getting-started.md +++ b/docs/hub/repositories-getting-started.md @@ -6,10 +6,10 @@ This beginner-friendly guide will help you get the basic skills you need to crea This document shows how to handle repositories through the web interface as well as through the terminal. There are no requirements if working with the UI. If you want to work with the terminal, please follow these installation instructions. -If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git LFS](https://git-lfs.github.com/), which will be used to handle large files such as images and model weights. +If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git-Xet](./xet/using-xet-storage#git-xet), which will be used to handle large files such as images and model weights. > [!TIP] -> For improved upload and download speeds when working with large files and Git, install the [Git Xet](xet/using-xet-storage#git) extension. +> To be able to download and upload large files from Git, you need to install the [Git Xet](./xet/using-xet-storage#git) extension. To be able to push your code to the Hub, you'll need to authenticate somehow. The easiest way to do this is by installing the [`huggingface_hub` CLI](https://huggingface.co/docs/huggingface_hub/index) and running the login command: @@ -110,19 +110,13 @@ You'll need to add your SSH public key to [your user settings](https://huggingfa Now's the time, you can add any files you want to the repository! 🔥 -Do you have files larger than 10MB? Those files should be tracked with `git-lfs`, which you can initialize with: +Do you have files larger than 10MB? Those files should be tracked with [`git-xet`](./xet/using-xet-storage#git-xet), which you can initialize with: ```bash -git lfs install +git xet install ``` -Note that if your files are larger than **5GB** you'll also need to run: - -```bash -hf lfs-enable-largefiles . -``` - -When you use Hugging Face to create a repository, Hugging Face automatically provides a list of common file extensions for common Machine Learning large files in the `.gitattributes` file, which `git-lfs` uses to efficiently track changes to your large files. However, you might need to add new extensions if your file types are not already handled. You can do so with `git lfs track "*.your_extension"`. +When you use Hugging Face to create a repository, Hugging Face automatically provides a list of common file extensions for common Machine Learning large files in the `.gitattributes` file, which `git-xet` uses to efficiently track changes to your large files. However, you might need to add new extensions if your file types are not already handled. You can do so with `git xet track "*.your_extension"`. ### Pushing files @@ -135,7 +129,7 @@ git commit -m "First model version" # You can choose any descriptive message git push ``` -And you're done! You can check your repository on Hugging Face with all the recently added files. For example, in the screenshot below the user added a number of files. Note that some files in this example have a size of `1.04 GB`, so the repo uses Git LFS to track it. +And you're done! You can check your repository on Hugging Face with all the recently added files. For example, in the screenshot below the user added a number of files. Note that some files in this example have a size of `1.04 GB`, so the repo uses Xet to track it.
diff --git a/docs/hub/xet/index.md b/docs/hub/xet/index.md index 5dc4aac021..2bb6937ff3 100644 --- a/docs/hub/xet/index.md +++ b/docs/hub/xet/index.md @@ -11,6 +11,11 @@ Storing these files directly in a pure Git repository is impractical. Not only a Instead, on the Hub, these large files are tracked using "pointer files" and identified through a `.gitattributes` file (both discussed in more detail below), which remain in the Git repository while the actual data is stored in remote storage (like [Amazon S3](https://aws.amazon.com/s3/)). As a result, the repository stays small and typical Git workflows remain efficient. +
+ + +
+ Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see [Backwards Compatibility & Legacy](./legacy-git-lfs)), the Hub has adopted Xet, a modern custom storage system built specifically for AI/ML development. It enables chunk-level deduplication, smaller uploads, and faster downloads than Git LFS. ## Open Source Xet Protocol diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index a839b7f6b5..80e559af3b 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -27,7 +27,7 @@ To see more detailed usage docs, refer to the `huggingface_hub` docs for: - [Managing the `hf_xet` cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#chunk-based-caching-xet) ## Git - + Git users can access the benefits of Xet by downloading and installing the Git Xet extension. Once installed, simply use the [standard workflows for managing Hub repositories with Git](../repositories-getting-started) - no additional changes necessary. ### Prerequisites @@ -36,21 +36,23 @@ Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/). ### Install on macOS or Linux (amd64 or aarch64) - Install using an installation script with the following command in your terminal (requires `curl` and `unzip`): - ``` - curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh - ``` - Or, install using [Homebrew](https://brew.sh/), with the following [tap](https://docs.brew.sh/Taps) (direct `brew install` coming soon): - ``` - brew tap huggingface/tap - brew install git-xet - git-xet install - ``` - - To verify the installation, run: - ``` - git-xet --version - ``` +Install using an installation script with the following command in your terminal (requires `curl` and `unzip`): +``` +curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh +``` + +Or, install using [Homebrew](https://brew.sh/), with the following [tap](https://docs.brew.sh/Taps) (direct `brew install` coming soon): +``` +brew tap huggingface/tap +brew install git-xet +git xet install +``` + +To verify the installation, run: +``` +git-xet --version +``` + ### Windows (amd64) Using an installer: @@ -60,7 +62,7 @@ Using an installer: Manual installation: - Download `git-xet-windows-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-x86_64.zip)) and unzip. - Place the extracted `git-xet.exe` under a `PATH` directory. - - Run `git-xet install` in a terminal. + - Run `git xet install` in a terminal. To verify the installation, run: ``` @@ -84,15 +86,15 @@ Under the hood, the [Xet protocol](https://huggingface.co/docs/xet/index) is inv ### Uninstall on macOS or Linux Using Homebrew: - ``` - git-xet uninstall - brew uninstall git-xet - ``` +```bash +git-xet uninstall +brew uninstall git-xet +``` If you used the installation script (for MacOS or Linux), run the following in your terminal: - ``` - git-xet uninstall - sudo rm $(which git-xet) - ``` +```bash +git-xet uninstall +sudo rm $(which git-xet) +``` ### Uninstall on Windows If you used the installer: diff --git a/docs/sagemaker/source/tutorials/sagemaker-sdk/deploy-sagemaker-sdk.md b/docs/sagemaker/source/tutorials/sagemaker-sdk/deploy-sagemaker-sdk.md index 85670adfe8..7380265cf0 100644 --- a/docs/sagemaker/source/tutorials/sagemaker-sdk/deploy-sagemaker-sdk.md +++ b/docs/sagemaker/source/tutorials/sagemaker-sdk/deploy-sagemaker-sdk.md @@ -172,7 +172,7 @@ Create your own `model.tar.gz` from a model from the 🤗 Hub: 1. Download a model: ```bash -git lfs install +git xet install git clone git@hf.co:{repository} ```