Skip to content

Commit 09ffa9d

Browse files
aeromomoclaude
andcommitted
feat: v7.0 FusionEngine + SemanticDedup + StructuralCollapse + full rebrand
- FusionEngine: 14-stage unified pipeline, 53.9% compression (vs 9.2% old path) - SemanticDedup: simhash fingerprint cross-message dedup - StructuralCollapse: import/assertion/pattern folding - README rewrite: benchmark tables, pipeline ASCII art, API examples - ARCHITECTURE.md: 560-line deep design doc - High-tech module branding (Rewind, Cortex, Neurosyntax, Ionizer, etc.) - Remove internal planning docs - 1663 tests passed, 0 failed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7f24aee commit 09ffa9d

20 files changed

Lines changed: 5859 additions & 1181 deletions

ARCHITECTURE.md

Lines changed: 560 additions & 0 deletions
Large diffs are not rendered by default.

CHANGELOG.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,45 @@ All notable changes to Claw Compactor will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [7.0.0] - 2026-03-17
9+
10+
### Architecture
11+
- **14-stage Fusion Pipeline** replacing the legacy 5-layer sequential approach
12+
- **Immutable data flow** — all pipeline state carried via frozen `FusionContext` / `FusionResult` dataclasses
13+
- **Stage gate mechanism**`should_apply()` lets each stage skip at zero cost when content type doesn't match
14+
- **FusionEngine** — unified entry point with `compress()` and `compress_messages()` API
15+
16+
### New Compression Stages
17+
- **QuantumLock** (order=3) — KV-cache alignment: isolates dynamic content in system prompts to maximize cache hit rate
18+
- **Cortex** (order=5) — intelligent content router auto-detecting 8 content types and 16 programming languages
19+
- **Photon** (order=8) — base64 image detection and compression
20+
- **SemanticDedup** (order=12) — SimHash fingerprint near-duplicate block elimination (intra + cross-message)
21+
- **Ionizer** (order=15) — JSON array statistical sampling with schema discovery and error preservation
22+
- **LogCrunch** (order=16) — build/test log line folding with occurrence counts
23+
- **SearchCrunch** (order=17) — search/grep result deduplication
24+
- **DiffCrunch** (order=18) — git diff context line folding
25+
- **StructuralCollapse** (order=20) — import merging, assertion collapse, repeated pattern compression
26+
- **Neurosyntax** (order=25) — AST-aware code compression via tree-sitter (safe regex fallback). Never shortens identifiers.
27+
- **Nexus** (order=35) — ML token-level compressor with stopword removal fallback
28+
29+
### Rewind (Reversible Compression)
30+
- Hash-addressed LRU store for original text retrieval
31+
- Marker embedding in compressed output — LLM tool-calls to retrieve originals
32+
- Integrated with Ionizer for JSON array reversal
33+
34+
### Performance
35+
- **5.9x improvement** over legacy regex path (weighted average)
36+
- **53.9% average compression** across 6 content types
37+
- **81.9% peak** on JSON arrays (Ionizer)
38+
- **25.0%** on Python source (Neurosyntax + StructuralCollapse)
39+
- **1,676 tests** (up from 848), 0 failures
40+
41+
### Benchmark vs Competitors
42+
- SWE-bench tasks: **12-19% compression** vs Headroom's **0%**
43+
- ROUGE-L fidelity maintained at 0.653 @ rate=0.3
44+
45+
---
46+
847
## [1.0.0] - 2026-03-09
948

1049
### Added
@@ -29,4 +68,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2968
- Up to **97% token reduction** on session transcripts
3069
- **50–70% token savings** on first run across unoptimized workspaces
3170

32-
[1.0.0]: https://github.com/aeromomo/claw-compactor/releases/tag/v1.0.0
71+
[7.0.0]: https://github.com/open-compress/claw-compactor/releases/tag/v7.0.0
72+
[1.0.0]: https://github.com/open-compress/claw-compactor/releases/tag/v1.0.0

0 commit comments

Comments
 (0)