Compress raw log lines into structural patterns with statistics, anomalies, and correlations.
Turn millions of noisy log lines into a handful of actionable patterns — with typed variables, quantile stats, anomaly flags, and severity scoring. Runs as a CLI, in the browser via WASM, or as a Rust library.
$ cat server.log | ctrlb-decompose
┌────────────────────────────────────────────────────────────────────┐
│ ctrlb-decompose: 1,247,831 lines -> 43 patterns (99.9% reduction) │
└────────────────────────────────────────────────────────────────────┘
#1 [ERROR] ██████████████████████ 18,402 (1.5%)
<TS> ERROR [<*>] Connection to <ip> timed out after <duration>
ip IPv4 unique=12 top: 10.0.1.15 (34%), 10.0.1.22 (28%)
duration Duration p50=120ms p99=4.8s
#2 [INFO] ████████████████████ 904,221 (72.5%)
<TS> INFO [<*>] Request from <ip> completed in <duration> status=<status>
ip IPv4 unique=1,847 top: 10.0.1.15 (12%), 10.0.1.22 (8%)
duration Duration p50=23ms p99=312ms
status Enum unique=3 values: 200 (91%), 404 (6%), 500 (3%)
Website coming soon.
ctrlb-decompose uses a two-stage normalization and clustering pipeline that processes logs in a single streaming pass with minimal memory footprint.
┌──────────────────────────────────────────────┐
│ ctrlb-decompose pipeline │
└──────────────────────────────────────────────┘
Raw Log Lines
│
▼
┌──────────────┐ Strip & parse timestamps (ISO 8601, Apache,
│ Timestamp │ syslog, Unix epoch, etc.) into normalized
│ Extraction │ <TS> markers with DateTime values.
└──────┬───────┘
│
▼
┌──────────────┐ Replace integers, floats, IPs, and strings
│ CLP │ with compact placeholder bytes. Structurally
│ Encoding │ identical lines now produce the same "logtype."
└──────┬───────┘
│
▼
┌──────────────┐ Tree-based similarity clustering (Drain3) groups
│ Drain3 │ logtypes into patterns. Differing tokens become
│ Clustering │ <*> wildcards. Incremental — no second pass needed.
└──────┬───────┘
│
▼
┌──────────────┐ Merge CLP-decoded values with Drain3 wildcard
│ Variable │ positions. Classify each variable into semantic
│ Extraction │ types: IPv4, UUID, Duration, Enum, Integer, etc.
│ & Typing │
└──────┬───────┘
│
▼
┌──────────────┐ DDSketch quantiles (p50/p99), HyperLogLog
│ Statistics │ cardinality estimation, top-k values, temporal
│ Accumulation │ bucketing, and reservoir-sampled example lines.
└──────┬───────┘
│
▼
┌──────────────┐ Frequency spikes, error cascades, low-cardinality
│ Anomaly │ flags, bimodal distributions, and clustered
│ Detection │ numeric detection.
└──────┬───────┘
│
▼
┌──────────────┐ Keyword-based severity (ERROR > WARN > INFO > DEBUG),
│ Scoring │ temporal co-occurrence, shared variable correlation,
│ & Correlation│ and error cascade detection across patterns.
└──────┬───────┘
│
▼
┌──────────────┐
│ Output │──── Human (ANSI terminal) / LLM (compact markdown) / JSON
└──────────────┘
CLP (Compact Log Pattern) encoding normalizes variable tokens into typed placeholders, so structurally identical lines produce identical logtypes regardless of the actual values:
Input: "Request from 10.0.1.15 completed in 45ms status=200"
Logtype: "Request from <dict> completed in <float>ms status=<int>"
The Drain algorithm builds a prefix tree over logtypes and groups them by token similarity (configurable threshold, default 0.4). Where tokens diverge, the template gains a <*> wildcard. This runs incrementally — each line is processed once with no second pass.
Extracted variables are classified into semantic types for richer analysis:
| Type | Example | Detection |
|---|---|---|
IPv4 / IPv6 |
10.0.1.15 |
CIDR pattern match |
UUID |
550e8400-e29b-... |
8-4-4-4-12 hex format |
Duration |
45ms, 3.2s |
Numeric + time unit suffix |
HexID |
0x1a2b3c |
4+ hex digits |
Integer |
200 |
Parses as i64 |
Float |
3.14 |
Contains ., parses as f64 |
Enum |
ERROR |
Low cardinality (<=20 unique, top-3 >= 80%) |
Timestamp |
2024-01-15T14:22:01Z |
RFC 3339 pattern |
String |
anything else | Fallback |
- Drain3 clusters: O(k) with LRU eviction (default 10k max)
- Quantiles: DDSketch — fixed ~200 bytes per numeric slot, no raw value storage
- Cardinality: HyperLogLog++ — ~200 bytes per high-cardinality variable
- Examples: Reservoir sampling — bounded buffer per pattern
brew tap ctrlb-hq/tap
brew install ctrlb-decomposecurl -LO https://github.com/ctrlb-hq/ctrlb-decompose/releases/download/v0.1.0/ctrlb-decompose_0.1.0-1_amd64.deb
sudo dpkg -i ctrlb-decompose_0.1.0-1_amd64.debgit clone https://github.com/ctrlb-hq/ctrlb-decompose.git
cd ctrlb-decompose
cargo build --release
# Binary at target/release/ctrlb-decompose# Pipe from stdin
cat /var/log/syslog | ctrlb-decompose
# Read from file
ctrlb-decompose server.log
# LLM-optimized output (compact, token-efficient)
ctrlb-decompose --llm app.log
# JSON output
ctrlb-decompose --json app.log
# Top 10 patterns with 3 example lines each
ctrlb-decompose --top 10 --context 3 app.logctrlb-decompose [OPTIONS] [FILE]
Arguments:
[FILE] Log file path (reads stdin if omitted or "-")
Options:
--human Human-readable output with colors (default)
--llm LLM-optimized compact markdown
--json Structured JSON output
--top <N> Show top N patterns (default: 20)
--context <N> Example lines per pattern (default: 0)
--no-color Disable ANSI colors
--no-banner Suppress header/footer
-q, --quiet Suppress progress messages
-h, --help Show help
-V, --version Show version
| Format | Flag | Best for |
|---|---|---|
| Human | --human (default) |
Terminal investigation — colored, visual bars |
| LLM | --llm |
Feeding into LLMs — compact, token-efficient markdown |
| JSON | --json |
Programmatic consumption — structured, machine-readable |
Use ctrlb-decompose directly from Claude Code — no CLI knowledge needed. The plugin installs ctrlb-decompose automatically and lets you analyze logs just by asking.
/plugin marketplace add ctrlb-hq/ctrlb-decompose
/plugin install ctrlb-decompose@ctrlb-hq
Just describe what you want in plain language:
- "Analyze the errors in
/var/log/app.log" - "What are the most common patterns in this log file?"
- "Summarize these logs and highlight anomalies"
Claude will check if ctrlb-decompose is installed (and walk you through installation if not), run the analysis, and explain the results — surfacing errors first, calling out anomalies, and suggesting what to investigate next.
See plugin/README.md for full details.