Kimari MicroCompress

Kimari MicroCompress

Reversible lossless compression for AI model files

Safetensors · GGUF · LoRA/PEFT · Training Checkpoints · Hugging Face Models

⚠️ Important Limitations

Please read before using KMC:

KMC does NOT perform compressed inference. It is designed for storage, transfer, and verification — not runtime memory optimization. Partial access decompresses data before returning it.
KMC does NOT modify model weights. Compression is lossless and reversible; every byte is preserved exactly.
Partial tensor loading returns bytes. The read_tensor method and --tensor flag return raw bytes. To convert to native tensor objects (PyTorch, NumPy), the experimental safetensors loader is required and depends on optional tensor libraries being installed.
Tensor extraction depends on tensor metadata. Archives must be created with --tensor-aware mode for tensor-level partial access. Older archives support file-level partial access but not tensor-level access.
GGUF-aware compression is experimental. The --gguf-aware flag adjusts codec selection for quantized GGUF tensors but does not yet implement block-level GGUF-specific compression strategies.
No fixed compression ratios should be assumed. Results vary significantly by model format, data type, and content. Synthetic benchmarks do not represent real-world ratios.
KMC is not a replacement for quantization. If you need smaller models for inference, use quantization (GGUF Q4_K, GPTQ, AWQ, etc.). KMC is complementary: it compresses already-quantized files for storage/transfer.
No pickle is used. KMC never deserializes pickle-based files. Only presence, size, and hash are recorded.
KMC is lossless only. There is no lossy mode and no weight modification of any kind.
Safetensors loader is experimental. The load_tensor() function may change without notice between versions.

Overview

Kimari MicroCompress (KMC) is an experimental tool for lossless, reversible compression of AI model files. It focuses on storage, transfer, verification, and packaging without modifying the original weights. The approach is grounded in the observation that AI model files — particularly safetensors and quantized formats — contain significant redundancy that general-purpose compression tools don't exploit optimally.

Key principle: Every byte that goes in must come out identically. KMC provides byte-exact roundtrip integrity verified via SHA-256 hashes at both the file and block level.

Why KMC?

Problem	KMC Solution
AI model files are large and expensive to store	Lossless compression with tensor-aware codecs (BytePlane, FloatPlane)
General-purpose tools ignore tensor structure	dtype-aware codec selection per block (FP32, BF16, FP16, quantized)
No integrity guarantees after compression	SHA-256 verification at file and block level
Mixed artifacts (model + LoRA + checkpoints)	Artifact-type detection and specialized workflows
GGUF quantized data doesn't compress well	Experimental `--gguf-aware` mode that adapts codec selection
No visibility into what's inside an archive	`kmc inspect` with format-specific metadata and tensor details

Features

Feature	Status
`kmc pack` — Compress files/directories	✅ Working
`kmc pack --tensor-aware` — Tensor-aware block alignment	✅ Working
`kmc pack --gguf-aware` — Experimental GGUF-aware compression	🧪 Experimental
`kmc pack-lora` — LoRA adapter workflow	✅ Working
`kmc pack-checkpoint` — Training checkpoint workflow	✅ Working
`kmc unpack` — Decompress archives (path-safe)	✅ Working
`kmc unpack --only PATTERN` — Selective file extraction	✅ Working
`kmc unpack --tensor NAME` — Selective tensor extraction	✅ Working
`kmc unpack --list` — List archive contents before extracting	✅ Working
`kmc verify` — Full verification report	✅ Working
`kmc inspect` — AI model inspection with tensor metadata	✅ Working
`kmc inspect` — Partial access info display	✅ Working
`kmc list` — List archive files and tensors	✅ Working
`kmc bench --partial-access` — Partial access benchmarks	✅ Working
KMCReader Python API — Partial access without full decompression	✅ Working
Experimental safetensors tensor loader	🧪 Experimental
Artifact auto-detection (HuggingFace, GGUF, LoRA, checkpoint)	✅ Working
GGUF tensor metadata parser (names, shapes, types, offsets, sizes)	✅ Working
GGUF quantization summary (Q4_K, Q5_1, F32, etc.)	✅ Working
Manifest v6 with index metadata and archive_offset	✅ Working
SHA-256 per-file and per-block hashing	✅ Working
256 KiB micro-blocks (configurable)	✅ Working
zstd / zlib / raw / byteplane / floatplane codec selection	✅ Working
Automatic codec selector (dtype-based)	✅ Working
safetensors real tensor metadata (names, shapes, dtypes, offsets)	✅ Working
LoRA/PEFT adapter detection with rank and target modules	✅ Working
Path traversal protection in unpack	✅ Working
Backward compatible with .kmc v0.2/v0.3/v0.4/v0.5/v0.6	✅ Working
GGUF block-level compression	🔬 Research
Runtime compressed loading (keeping blocks compressed in memory)	🔬 Research

Installation

# Clone and install in development mode
git clone https://github.com/smouj/kimari-microcompress.git
cd kimari-microcompress
pip install -e ".[dev]"

# With safetensors optional dependency (enhanced header parsing)
pip install -e ".[safetensors]"

# With ZipNN optional dependency (for benchmark comparison)
pip install -e ".[zipnn]"

# All optional dependencies
pip install -e ".[all]"

Requirements

Dependency	Required	Purpose
Python 3.10+	Yes	Runtime
`zstandard`	Yes	Best compression codec
`zlib`	Yes (built-in)	Fallback compression
`safetensors`	No (optional)	Enhanced safetensors header parsing
`zipnn`	No (optional)	Benchmark comparison

Quick Start

# Pack a model directory
kmc pack ./my-model ./my-model.kmc

# Pack with tensor-aware mode (recommended for safetensors)
kmc pack ./my-model ./my-model.kmc --tensor-aware

# Pack with GGUF-aware mode (experimental, for GGUF files)
kmc pack ./my-model ./my-model.kmc --gguf-aware

# Pack a LoRA adapter
kmc pack-lora ./my-lora-adapter ./my-lora.kmc

# Pack a training checkpoint
kmc pack-checkpoint ./checkpoint-1000 ./checkpoint-1000.kmc

# Verify integrity (full report)
kmc verify ./my-model.kmc

# Inspect archive manifest
kmc inspect ./my-model.kmc

# Inspect AI model directory (detects formats, reads tensor metadata)
kmc inspect ./my-model/ --tensors

# Inspect as LoRA adapter
kmc inspect ./my-lora/ --lora

# Inspect as training checkpoint
kmc inspect ./checkpoint-1000/ --checkpoint

# Inspect GGUF file with tensor details
kmc inspect ./model.gguf --gguf

# Inspect with JSON output
kmc inspect ./my-model/ --json

# Unpack to a directory
kmc unpack ./my-model.kmc ./restored-model/

# Run benchmark with codec comparison
kmc bench ./my-model ./my-model-bench.kmc --compare-codecs

# Benchmark with ZipNN comparison
kmc bench ./my-model ./my-model-bench.kmc --compare-zipnn --json --output report.json

CLI Reference

Core Commands

Command	Description
`kmc pack SOURCE OUTPUT`	Compress a directory/file into a `.kmc` archive
`kmc pack-lora SOURCE OUTPUT`	Compress a LoRA adapter directory
`kmc pack-checkpoint SOURCE OUTPUT`	Compress a training checkpoint directory
`kmc unpack ARCHIVE OUTPUT`	Decompress a `.kmc` archive
`kmc verify ARCHIVE`	Full integrity verification report
`kmc inspect TARGET`	Inspect archive or AI model directory
`kmc list ARCHIVE`	List files and tensors in an archive
`kmc bench SOURCE OUTPUT`	Benchmark compression performance

Key Flags

Flag	Command	Description
`--tensor-aware`	pack	Align blocks to tensor boundaries for safetensors files
`--gguf-aware`	pack	Adjust codec selection for quantized GGUF tensors
`--codec`	pack, bench	Codec: `auto`, `byteplane`, `floatplane`, `zstd`, `zlib`, `raw`
`--only PATTERN`	unpack	Extract only files matching a glob pattern
`--tensor NAME`	unpack	Extract a specific tensor by name
`--list`	unpack	List available files/tensors without extracting
`--lora`	inspect	Inspect as LoRA adapter
`--checkpoint`	inspect	Inspect as training checkpoint
`--gguf`	inspect	Inspect as GGUF model with tensor details
`--tensors`	inspect, list	Show detailed tensor information
`--compression`	inspect	Show compression summary with codec usage
`--partial-access`	bench	Benchmark partial access performance
`--json`	inspect, bench, list, unpack	Output as JSON
`--compare-codecs`	bench	Compare all available codecs
`--compare-zipnn`	bench	Compare with ZipNN (if installed)

Codecs

KMC v0.7 supports six codecs, selected per-block for optimal results:

Codec	Type	Best For	Description
`auto`	Selector	General use	Tries candidates per dtype, picks smallest result
`floatplane`	Tensor-aware	FP32/BF16/FP16	Sign/exponent/mantissa bit-level separation
`byteplane`	Tensor-aware	FP32/BF16/FP16	Byte-plane separation by position within element
`zstd`	General	Mixed data	High-ratio general-purpose compression
`zlib`	General	Fallback	Always available, decent compression
`raw`	Passthrough	Incompressible	No compression, used when compression expands data

Automatic Codec Selection

When --codec auto (default), the selector chooses per-block based on tensor dtype:

Tensor dtype	Candidate chain
FP32, BF16, FP16	`floatplane → byteplane → zstd → zlib → raw`
Quantized (Q4_K, Q8_0, etc.)	`zstd → zlib → raw`
Unknown / non-float	`zstd → zlib → raw`

With --gguf-aware, quantized GGUF tensors skip float-aware transforms automatically.

Archive Format

The .kmc format is designed for verifiable, block-oriented storage:

┌─────────────────────────────────────────┐
│  Magic: "KMC\x00\x01\x00\x00\x00"  8B │
├─────────────────────────────────────────┤
│  Manifest length: uint64 BE         8B │
├─────────────────────────────────────────┤
│  Manifest: JSON (UTF-8)        Variable│
│   - version, tool info                 │
│   - file entries with paths & hashes   │
│   - block entries with codecs          │
│   - per-block codec_metadata (v3+)     │
│   - tensor entries (v2+, optional)     │
│   - artifact_type (v4+)                │
│   - artifact_metadata (v4+)            │
│   - format_metadata (v4+)              │
├─────────────────────────────────────────┤
│  Block data: concatenated       Variable│
│   - Each block independently compressed │
│   - SHA-256 verified per block         │
└─────────────────────────────────────────┘

See FORMAT_SPEC.md for the complete specification.

Architecture

src/kmc/
├── archive.py              # Core pack/unpack/verify with security checks
├── benchmark.py            # Performance benchmarking with codec comparison
├── cli.py                  # Command-line interface
├── hashing.py              # SHA-256 integrity hashing
├── inspector.py            # AI model format detection with metadata
├── manifest.py             # KMC manifest (v6: index, archive_offset)
├── reader.py               # KMCReader partial-access API (v0.7+)
├── gguf.py                 # Legacy GGUF module (see formats/gguf.py)
├── tensor_inspector.py     # Legacy safetensors metadata (see formats/)
├── codecs/
│   ├── __init__.py         # Public codec API
│   ├── base.py             # Codec protocol, CodecContext, CodecResult
│   ├── byteplane.py        # BytePlane codec (byte-plane separation)
│   ├── floatplane.py       # FloatPlane codec (sign/exp/mantissa separation)
│   ├── registry.py         # Codec registry (discover/instantiate by name)
│   ├── selector.py         # Automatic codec selector (dtype-based candidates)
│   ├── legacy.py           # Legacy CodecId/compress_block API (v0.2/v0.3 compat)
│   ├── raw.py              # Raw passthrough codec
│   ├── zlib_codec.py       # zlib codec
│   └── zstd_codec.py       # zstd codec
├── formats/
│   ├── __init__.py         # Format module registry
│   ├── safetensors.py      # Safetensors metadata, shards, LoRA detection
│   └── gguf.py             # GGUF header + tensor metadata parsing (v0.5+)
├── index/
│   ├── __init__.py         # Index module exports
│   ├── block_index.py      # BlockIndex: block ID -> BlockLocation
│   ├── file_index.py       # FileIndex: file path -> FileLocation
│   └── tensor_index.py     # TensorIndex: tensor name -> TensorLocation
├── loaders/
│   ├── __init__.py         # Loader module exports
│   └── safetensors_loader.py  # Experimental tensor-byte loader (v0.7+)
├── workflows/
│   ├── __init__.py         # Workflow module registry
│   ├── lora.py             # LoRA/PEFT adapter detection and packing
│   └── checkpoint.py       # Training checkpoint detection and packing
└── integrations/
    └── kimari.py           # Kimari CLI integration adapters

See ARCHITECTURE.md for detailed design decisions.

Partial Access

KMC v0.7 introduces partial access features that allow reading specific files and tensors from .kmc archives without full decompression. This is powered by the KMCReader Python API and the --only/--tensor/--list CLI flags.

Python API

from kmc.reader import KMCReader

with KMCReader("model.kmc") as reader:
    # List contents
    files = reader.list_files()
    tensors = reader.list_tensors()

    # Read specific files without full decompression
    config = reader.read_file("config.json")
    weight_bytes = reader.read_tensor("model.layers.0.mlp.down_proj.weight")

    # Extract to disk
    reader.extract_file("config.json", "./output/")

CLI Selective Extraction

# List archive contents
kmc list model.kmc

# Extract only JSON files
kmc unpack model.kmc ./output --only "*.json"

# Extract a specific tensor
kmc unpack model.kmc ./output --tensor "transformer.h.0.attn.weight"

# List before extracting
kmc unpack model.kmc ./output --list

Important: Partial access decompresses the requested data before returning it. It does NOT reduce VRAM during inference. Tensor extraction requires archives created with --tensor-aware mode.

See PARTIAL_ACCESS.md and KMC_READER_API.md for details.

Security

KMC takes extraction security seriously:

Path traversal protection — All file paths validated before extraction; .., absolute paths, null bytes, and control characters are rejected
Symlink protection — Refuses to overwrite existing symlinks during unpack
Duplicate path detection — Manifests with duplicate file paths are rejected
Manifest size limits — Oversized manifests rejected to prevent DoS
Block hash verification — Every block verified against SHA-256 hash
File hash verification — Reconstructed files verified against SHA-256 hash
No pickle deserialization — Pickle-based files detected and compressed as raw bytes only

See SECURITY_MODEL.md for the complete security model.

Documentation

Document	Description
Architecture	Design decisions and module structure
Format Specification	Complete `.kmc` format spec (v6)
Security Model	Threat model and mitigations
Partial Access	Partial access features and architecture
KMCReader API	Python API reference for partial access
Selective Extraction	CLI selective extraction guide
Experimental Loaders	Safetensors tensor loader documentation
GGUF Support	GGUF parsing and `--gguf-aware` mode
LoRA Workflow	LoRA adapter compression and inspection
Checkpoint Workflow	Training checkpoint compression and inspection
Hugging Face Workflow	Working with Hugging Face models
Real Model Benchmark	Running benchmarks with HuggingFace models
Kimari Integration	Integration with Kimari CLI
Benchmark Plan	Performance testing strategy
Research Notes	Technical references and codec design rationale
Roadmap	Development priorities
Changelog	Version history

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest -q

# Lint
ruff check .

# Format check
ruff format --check .

# CLI help
kmc --help
kmc pack-lora --help
kmc pack-checkpoint --help

# Create demo model and test
python scripts/create_demo_model.py

See CONTRIBUTING.md for contribution guidelines.

Technical Foundation

KMC's approach is informed by research and industry practice:

ZipNN (IBM Research) — Demonstrates that lossless compression specific to AI models can save ~1/3 of size on popular models, and >50% in some cases, without changing weights.
safetensors (Hugging Face) — Treated as the priority format because it's secure, fast, and avoids pickle vulnerabilities.
GGUF (llama.cpp) — The standard binary format for quantized models. KMC v0.5 adds full tensor metadata parsing and experimental GGUF-aware compression.
NetZIP (IBM Research) — Explores lossless compression for gradients and activations in distributed training — a research direction documented for KMC's roadmap.

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
examples		examples
prompts		prompts
scripts		scripts
src/kmc		src/kmc
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_ES.md		README_ES.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kimari MicroCompress

⚠️ Important Limitations

Table of Contents

Overview

Why KMC?

Features

Installation

Requirements

Quick Start

CLI Reference

Core Commands

Key Flags

Codecs

Automatic Codec Selection

Archive Format

Architecture

Partial Access

Python API

CLI Selective Extraction

Security

Documentation

Development

Technical Foundation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kimari MicroCompress

⚠️ Important Limitations

Table of Contents

Overview

Why KMC?

Features

Installation

Requirements

Quick Start

CLI Reference

Core Commands

Key Flags

Codecs

Automatic Codec Selection

Archive Format

Architecture

Partial Access

Python API

CLI Selective Extraction

Security

Documentation

Development

Technical Foundation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages