Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

xet-data

crates.io docs.rs License

Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.

Overview

  • Content-defined chunking — Gear-hash based chunking for deduplication
  • Deduplication — Probe and register chunks against metadata shards
  • File reconstruction — Reassemble files from deduplicated chunk references
  • Progress tracking — Hooks for upload/download progress reporting

This crate is part of xet-core.

License

Apache-2.0