You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All PRs must pass CI on Python 3.9–3.12. The test suite has 1600+ tests — don't be alarmed, they run fast.
64
+
65
+
### 5. Submit a PR
66
+
67
+
1. Push your branch to your fork
68
+
2. Open a Pull Request against `main`
69
+
3. Fill in the PR template with a clear description
70
+
4. Link any related issues
71
+
72
+
## Code Guidelines
73
+
74
+
### Architecture
75
+
76
+
Claw Compactor is built around a 14-stage Fusion Pipeline. Each stage is a self-contained compressor inheriting from `FusionStage`. See [ARCHITECTURE.md](ARCHITECTURE.md) for the full design.
77
+
78
+
### Key Principles
79
+
80
+
-**Immutability** — `FusionContext` is frozen. Every stage produces a new `FusionResult`. Never mutate inputs.
81
+
-**Gate-before-compress** — Each stage has `should_apply()`. If a stage doesn't apply to the content type, it should be a no-op at zero cost.
82
+
-**Zero required dependencies** — The core pipeline runs without any external packages. Optional dependencies (tiktoken, tree-sitter) are runtime-detected.
83
+
84
+
### Adding a New Fusion Stage
85
+
86
+
1. Create a new file in `scripts/lib/fusion/stages/`
87
+
2. Inherit from `FusionStage`
88
+
3. Implement `should_apply()` and `apply()`
89
+
4. Register it in the stage registry
90
+
5. Add tests covering happy path, edge cases, and the gate condition
91
+
92
+
```python
93
+
from scripts.lib.fusion.base import FusionStage, FusionContext, FusionResult
94
+
95
+
classMyStage(FusionStage):
96
+
name ="my_stage"
97
+
order =22# controls execution order in the pipeline
Claw Compactor is an open-source **LLM token compression engine** built around a 14-stage **Fusion Pipeline**. Each stage is a specialized compressor — from AST-aware code analysis to JSON statistical sampling to simhash-based deduplication — chained through an immutable data flow architecture where each stage's output feeds the next.
**Why Claw Compactor wins:** LLMLingua-2 drops tokens by perplexity score — effective for natural language, but destroys code identifiers, JSON keys, and log patterns. Claw Compactor uses content-type-aware stages that understand the structure of what they're compressing.
110
+
111
+
---
112
+
57
113
```
58
114
Input
59
115
|
@@ -286,7 +342,7 @@ See [ARCHITECTURE.md](ARCHITECTURE.md) for the full technical deep-dive:
0 commit comments