diff --git a/core/README.md b/core/README.md index 07b255f8523f..e51bfdb6811a 100644 --- a/core/README.md +++ b/core/README.md @@ -29,7 +29,7 @@ OpenDAL supports the following storage [services](https://docs.rs/opendal/latest | Type | Services | |--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------| | Standard Storage Protocols | ftp http [sftp] [webdav] | -| Object Storage Services | [azblob] [cos] [gcs] [obs] [oss] [s3]
[b2] [openstack_swift] [upyun] [vercel-blob] | +| Object Storage Services | [azblob] [cos] [gcs] [obs] [oss] [s3]
[b2] [huggingface] [openstack_swift] [upyun] [vercel-blob] | | File Storage Services | fs [alluxio] [azdls] [azfile] [compfs]
[dbfs] [gridfs] [hdfs] [hdfs-native] [ipfs] [webhdfs] | | Consumer Cloud Storage Service | [aliyun-drive] [gdrive] [onedrive] [dropbox] [koofr]
[pcloud] [seafile] [yandex-disk] | | Key-Value Storage Services | [cacache] [cloudflare-kv] [dashmap] memory [etcd]
[foundationdb] [persy] [redis] [rocksdb] [sled]
[redb] [tikv] | diff --git a/core/services/hf/src/docs.md b/core/services/hf/src/docs.md index f7620f0cd241..2ebed581897b 100644 --- a/core/services/hf/src/docs.md +++ b/core/services/hf/src/docs.md @@ -2,7 +2,12 @@ This service will visit the [Hugging Face API](https://huggingface.co/docs/huggi Hugging Face doesn't host official HTTP API docs. Detailed HTTP request API information can be found on the [`huggingface_hub` Source Code](https://github.com/huggingface/huggingface_hub). -Both `hf://` and `huggingface://` URI schemes are supported. +## Storage Backends + +This service supports two storage backends: + +- **Git-based repositories** (`model`, `dataset`, `space`): Files are versioned in a Git repository. Large files are stored via [Xet](https://huggingface.co/docs/hub/xet/index), Hugging Face's chunk-deduplicated storage backend; writes create new commits. Supports `revision` for branch/commit targeting. +- **Object store buckets** (`bucket`): Files are stored in a Hugging Face Bucket (not git-backed). No revisions or commits — all reads and writes use the [Xet](https://huggingface.co/docs/hub/xet/index) protocol directly. ## Capabilities @@ -16,21 +21,22 @@ This service can be used to: - [x] list - [ ] copy - [ ] rename -- [ ] ~~presign~~ +- [ ] presign ## Configurations -- `repo_type`: The type of the repository (model, dataset, or space). +- `repo_type`: The type of the repository. One of `model`, `dataset`, `space`, or `bucket`. - `repo_id`: The id of the repository. -- `revision`: The revision of the repository. +- `revision`: The revision of the repository. Only applicable for git-based repo types (`model`, `dataset`, `space`). - `root`: Set the work directory for backend. - `token`: The token for accessing the repository. Required for write operations. +- `endpoint`: The Hub base URL. Default is `https://huggingface.co`. Can also be set via the `HF_ENDPOINT` environment variable. Refer to [`HfBuilder`]'s public API docs for more information. ## Examples -### Via Builder +### Via Builder (Git-based dataset) ```rust,no_run use opendal_core::Operator; @@ -39,17 +45,11 @@ use opendal_service_hf::Hf; #[tokio::main] async fn main() -> Result<()> { - // Create Hugging Face backend builder - let mut builder = Hf::default() - // set the type of Hugging Face repository + let builder = Hf::default() .repo_type("dataset") - // set the id of Hugging Face repository - .repo_id("databricks/databricks-dolly-15k") - // set the revision of Hugging Face repository + .repo_id("username/my-dataset") .revision("main") - // set the root, all operations will happen under this root .root("/path/to/dir") - // set the token for accessing the repository .token("access_token"); let op: Operator = Operator::new(builder)?.finish(); @@ -57,3 +57,47 @@ async fn main() -> Result<()> { Ok(()) } ``` + +### Via Builder (Object store bucket) + +```rust,no_run +use opendal_core::Operator; +use opendal_core::Result; +use opendal_service_hf::Hf; + +#[tokio::main] +async fn main() -> Result<()> { + let builder = Hf::default() + .repo_type("bucket") + .repo_id("username/my-bucket") + .token("access_token"); + + let op: Operator = Operator::new(builder)?.finish(); + + Ok(()) +} +``` + +### Via URI + +```rust,no_run +use opendal_core::Operator; +use opendal_core::Result; + +#[tokio::main] +async fn main() -> Result<()> { + // Git-based dataset + let op = Operator::from_uri(( + "hf://datasets/username/my-dataset@main", + vec![("token", "access_token")], + ))?; + + // Object store bucket + let op = Operator::from_uri(( + "hf://buckets/username/my-bucket", + vec![("token", "access_token")], + ))?; + + Ok(()) +} +```