Skip to content

Commit cf6e990

Browse files
authored
Implement command-aware content classification heuristics (#51)
## PR Auto Describe ## Summary This PR delivers a major overhaul of omni's content processing pipeline, including command-aware content classification, historical data repair via `omni doctor`, pipeline refactoring, and CLI tooling improvements to boost detection accuracy and user experience. --- ## Key Changes - **Command-Aware Content Classification**: Rewrote the core `classify` function to use command context for far more accurate content type detection, with built-in heuristics for git, system, build, web, and cloud commands. - **Historical Stats Repair**: Added `omni doctor --fix` to reclassify outdated, mislabeled historical distillation records in the local SQLite store. - **Pipeline Refactor**: Simplified hook and pipeline logic, replaced the legacy `compose` function with dedicated distillation and auto-learn evaluation code, added output truncation safety. - **CLI & Tooling Updates**: Refined `omni stats` with better filtering and clearer messaging, updated `omni learn` to include command context in generated filters, reduced update check cache TTL to 4 hours. --- ## Detailed Breakdown ### Core Classification System Rewrote `pipeline/classifier.rs` to add an optional `command_name` parameter to `classify()`, with pre-stage heuristics that extract the base command (e.g. `git` from `git status`) and route outputs to correct content types: git subcommands, system utilities, build tools, web/JS tools, and cloud tools. Updated all existing tests to use the new API, added new test cases for command-specific classification, and updated all benchmarks/tests to pass `None` as context where unavailable. ### Pipeline & Hook Logic Refactored `hooks/pipe.rs` and `hooks/post_tool.rs` to pass command context to `classify()`, replaced legacy `compose` calls with direct distiller usage, added auto-learn evaluation via `evaluate_learning()`, and added output truncation safety checks. Split `composer.rs`'s `compose` function into a dedicated `evaluate_learning()` function for auto-learn triggers, removed the unused `colored` import, and deleted the legacy `compose` function entirely. Added properly formatted rewind store notifications in both hook files. ### Data Store & Doctor Command Added `reclassify_historical_data()` and `has_upgradable_history()` to `sqlite.rs` to update old misclassified distillation records using their stored command context. Updated `cli/doctor.rs` to add an Intelligence Consistency check section: prompts users to run `--fix` to upgrade historical stats, or runs repairs automatically when the flag is present. Updated `stats.rs` to show a prompt to run `omni doctor --fix` when upgradable history exists, added filter flag support (--today/--week/--month/--all-commands) that defaults to detail mode, improved messaging for hidden zero-savings commands, and cleaned up default stats labeling. ### CLI Learn & MCP Server Updated `cli/learn.rs` to pass command context to `generate_toml()` and `apply_to_config()`, so auto-generated filters include the triggering command as a `match_command` rule and better descriptions. Updated `mcp/server.rs` to use the new `classify()` API and pass None context, updated filter generation calls to match the new signature. ### Minor Updates Reduced the update check cache TTL in `guard/update.rs` from 24 hours to 4 hours. Updated all test files (`savings_assertions.rs`, `security_tests.rs`) to use the new `classify()` API and replace legacy `compose` calls with direct distiller usage. Updated all benchmark calls in `benches/pipeline.rs` to pass the new command context parameter. --- ## Notes - The command-aware classification drastically improves accuracy for tool-specific output (e.g. git diffs, cargo build logs, kubectl outputs) by using the original executed command as context. - The historical reclassification fixes long-standing issues with mislabeled old stats records. --- ## Breaking Changes 1. **`classifier::classify` API Break**: The function now requires an optional `command_name` parameter; all existing calls must be updated to pass `None` if no command context is available (e.g. `classifier::classify(input, None)`). 2. **Removed `composer::compose` Function**: Direct calls to this legacy function will fail; use the distiller directly alongside `evaluate_learning()` for auto-learn logic. 3. **Updated `generate_toml` and `apply_to_config` Signatures**: These functions now accept an optional `command` parameter; callers must add this argument (pass `None` if unused).
2 parents dbded1f + 5fde93d commit cf6e990

14 files changed

Lines changed: 633 additions & 392 deletions

File tree

benches/pipeline.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ fn bench_classify(c: &mut Criterion) {
2525

2626
for (name, input) in &fixtures {
2727
c.bench_with_input(BenchmarkId::new("classify", name), input, |b, i| {
28-
b.iter(|| classifier::classify(i))
28+
b.iter(|| classifier::classify(i, None))
2929
});
3030
}
3131
}
@@ -35,7 +35,7 @@ fn bench_full_pipeline(c: &mut Criterion) {
3535

3636
c.bench_function("full_pipeline_cargo_build", |b| {
3737
b.iter(|| {
38-
let ctype = classifier::classify(input);
38+
let ctype = classifier::classify(input, None);
3939
let segments = scorer::score_segments(input, &ctype, None);
4040
let distiller = distillers::get_distiller(&ctype);
4141
distiller.distill(&segments, input, None)
@@ -49,7 +49,7 @@ fn bench_hook_roundtrip(c: &mut Criterion) {
4949

5050
c.bench_function("hook_roundtrip_50kb", |b| {
5151
b.iter(|| {
52-
let ctype = classifier::classify(&large_input);
52+
let ctype = classifier::classify(&large_input, None);
5353
let segments = scorer::score_segments(&large_input, &ctype, None);
5454
let distiller = distillers::get_distiller(&ctype);
5555
distiller.distill(&segments, &large_input, None)

src/cli/doctor.rs

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -430,6 +430,47 @@ pub fn run(args: &[String]) -> anyhow::Result<()> {
430430
println!(" {:<15} none", "Project:".bright_black());
431431
}
432432

433+
// 10. Intelligence Consistency
434+
println!("\n {}", "Intelligence:".bold().bright_white());
435+
if let Ok(store) = Store::open() {
436+
if fix_mode {
437+
match store.reclassify_historical_data() {
438+
Ok(count) => {
439+
if count > 0 {
440+
println!(
441+
" {:<15} {} records upgraded to new categories {}",
442+
"Upgrade:".bright_black(),
443+
count.to_string().yellow().bold(),
444+
"[FIXED]".green().bold()
445+
);
446+
} else {
447+
println!(
448+
" {:<15} historical statistics are up to date {}",
449+
"Status:".bright_black(),
450+
"[OK]".green().bold()
451+
);
452+
}
453+
}
454+
Err(e) => {
455+
println!(
456+
" {:<15} upgrade failed: {} {}",
457+
"Upgrade:".bright_black(),
458+
e,
459+
"[ERROR]".red().bold()
460+
);
461+
}
462+
}
463+
} else {
464+
// Diagnostic mode: check how many could be upgraded
465+
// For now, simpler to just say 'Run with --fix to upgrade'
466+
println!(
467+
" {:<15} Run with {} to upgrade historical stats",
468+
"Status:".bright_black(),
469+
"--fix".cyan()
470+
);
471+
}
472+
}
473+
433474
if let Some(latest) = crate::guard::update::check() {
434475
crate::guard::update::print_notification(&latest);
435476
}

src/cli/learn.rs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -236,8 +236,11 @@ pub fn run_learn(args: &[String]) -> Result<()> {
236236

237237
let filter_name = format!("learned_{}", Utc::now().timestamp());
238238

239+
let command_hint = executions.first().map(|e| e.command.as_str());
240+
239241
if dry_run {
240-
let generated = crate::session::learn::generate_toml(&candidates, &filter_name);
242+
let generated =
243+
crate::session::learn::generate_toml(&candidates, &filter_name, command_hint);
241244
println!(
242245
"\n{}",
243246
"─────────────────────────────────────────"
@@ -264,7 +267,7 @@ pub fn run_learn(args: &[String]) -> Result<()> {
264267
} else if apply {
265268
let path = crate::paths::learned_filters_path();
266269
let _ = crate::paths::ensure_omni_home();
267-
let added = apply_to_config(&candidates, &filter_name, &path)?;
270+
let added = apply_to_config(&candidates, &filter_name, &path, command_hint)?;
268271
if added > 0 {
269272
println!(
270273
"\n{}",

src/cli/stats.rs

Lines changed: 65 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -137,13 +137,21 @@ pub fn run(args: &[String], store: &Store) -> Result<()> {
137137
return Ok(());
138138
}
139139

140-
// Mode detection
141-
let mode = if args.iter().any(|a| a == "--detail") {
140+
let detail_flag = args.iter().any(|a| a == "--detail");
141+
let type_flag = args.iter().any(|a| a == "--by-type");
142+
let json_flag = args.iter().any(|a| a == "--json");
143+
let filter_flag = args
144+
.iter()
145+
.any(|a| a == "--today" || a == "--week" || a == "--month" || a == "--all-commands");
146+
147+
let mode = if detail_flag {
142148
"detail"
143-
} else if args.iter().any(|a| a == "--by-type") {
149+
} else if type_flag {
144150
"by-type"
145-
} else if args.iter().any(|a| a == "--json") {
151+
} else if json_flag {
146152
"json"
153+
} else if filter_flag {
154+
"detail" // Implicit detail mode for scoped queries
147155
} else {
148156
"default"
149157
};
@@ -227,7 +235,7 @@ fn run_default(store: &Store) -> Result<()> {
227235

228236
if !top_types.is_empty() {
229237
println!("\n {}", "Top Savings by Type:".bold().bright_white());
230-
for (content_type, count, pct, _) in &top_types {
238+
for (content_type, count, pct, _commands) in &top_types {
231239
let bar = format_bar_with_empty(*pct);
232240
let bar_colored = if *pct > 80.0 {
233241
bar.bright_green()
@@ -237,12 +245,14 @@ fn run_default(store: &Store) -> Result<()> {
237245
bar.bright_red()
238246
};
239247

248+
let label_display = content_type.clone();
249+
240250
println!(
241251
" {:<13} {} {:>5.1}% ({}x)",
242-
content_type.bright_cyan(),
252+
label_display.bright_cyan(),
243253
bar_colored,
244254
pct,
245-
count,
255+
count
246256
);
247257
}
248258
}
@@ -267,6 +277,19 @@ fn run_default(store: &Store) -> Result<()> {
267277
" 💡 {} for content type mapping",
268278
"omni stats --by-type".bright_cyan()
269279
);
280+
281+
if store.has_upgradable_history() {
282+
println!(
283+
" 💡 Run {} to upgrade historical stats",
284+
"omni doctor --fix".bright_cyan()
285+
);
286+
}
287+
288+
// Update Notification (4h cache)
289+
if let Some(latest) = crate::guard::update::check() {
290+
crate::guard::update::print_notification(&latest);
291+
}
292+
270293
println!();
271294
Ok(())
272295
}
@@ -368,13 +391,18 @@ fn run_detail(args: &[String], store: &Store) -> Result<()> {
368391
);
369392
}
370393

371-
// By Command — top 10, filter 0% savings
394+
// By Command — top 10 (or all if requested), filter 0% savings
372395
let filters = store.filter_breakdown(since)?;
373-
let display_filters: Vec<_> = filters
374-
.iter()
375-
.filter(|(_, _, pct)| *pct > 0.0)
376-
.take(10)
377-
.collect();
396+
let all_flag = args.iter().any(|a| a == "--all-commands");
397+
let display_filters: Vec<_> = if all_flag {
398+
filters.iter().collect()
399+
} else {
400+
filters
401+
.iter()
402+
.filter(|(_, _, pct)| *pct > 0.0)
403+
.take(10)
404+
.collect()
405+
};
378406

379407
if !display_filters.is_empty() {
380408
println!("\n {}", "By Command:".bold().bright_white());
@@ -416,16 +444,30 @@ fn run_detail(args: &[String], store: &Store) -> Result<()> {
416444
);
417445
}
418446

419-
if filters.len() > 10 {
420-
println!(
421-
"\n {}",
422-
format!(
423-
"Run `omni stats --detail --all-commands` for all {} commands.",
424-
filters.len()
425-
)
426-
.bright_black()
427-
.italic()
428-
);
447+
if !all_flag {
448+
let filtered_count = filters.iter().filter(|(_, _, pct)| *pct > 0.0).count();
449+
let hidden_zero = filters.len() - filtered_count;
450+
451+
if filtered_count > 10 {
452+
println!(
453+
"\n {}",
454+
format!(
455+
"Showing top 10 of {} commands with active savings.",
456+
filtered_count
457+
)
458+
.bright_black()
459+
.italic()
460+
);
461+
}
462+
463+
if hidden_zero > 0 {
464+
println!(
465+
" {}",
466+
format!("({} noise commands with 0% savings hidden. Use --all-commands to see all).", hidden_zero)
467+
.bright_black()
468+
.italic()
469+
);
470+
}
429471
}
430472
}
431473

src/guard/update.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,10 @@ pub fn get_status() -> Status {
3232
.unwrap_or_default()
3333
.as_secs();
3434

35-
// Try to get latest version from cache or fetch it
35+
// Try to get latest version from cache or fetch it (Cache: 4 hours)
3636
let latest = if let Ok(content) = fs::read_to_string(&cache_path)
3737
&& let Ok(cache) = serde_json::from_str::<UpdateCache>(&content)
38-
&& now < cache.last_checked + 86400
38+
&& now < cache.last_checked + 14400
3939
{
4040
Some(cache.latest_version)
4141
} else {

src/hooks/pipe.rs

Lines changed: 58 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -179,47 +179,65 @@ fn distill(
179179
}
180180
}
181181

182-
let (output, filter_name, ctype, rewind_hash, kept_count, dropped_count) =
183-
if let Some(filter) = matched_toml {
184-
let out = filter.apply(&input_text);
185-
(out, filter.name.clone(), ContentType::Unknown, None, 0, 0)
186-
} else {
187-
let c = classifier::classify(&input_text);
188-
189-
let collapse_result = collapse::collapse(&input_text, &c);
190-
let effective_input = collapse_result.collapsed_lines.join("\n");
191-
192-
let active_session_opt = session.as_ref().and_then(|m| m.lock().ok());
193-
let scored_segments =
194-
scorer::score_segments(&effective_input, &c, active_session_opt.as_deref());
195-
drop(active_session_opt);
196-
197-
let compose_config = composer::ComposeConfig::default();
198-
let decision = composer::decide_rewind(&scored_segments, &c);
199-
200-
let k_count = scored_segments
201-
.iter()
202-
.filter(|s| s.final_score() >= compose_config.threshold)
203-
.count();
204-
let d_count = scored_segments.len() - k_count;
205-
206-
let store_for_compose = if decision.should_store { store } else { None };
207-
208-
let (out, r_hash) = composer::compose(
209-
scored_segments,
210-
if decision.should_store {
211-
Some(input_text.clone())
212-
} else {
213-
None
214-
}, // Temporary clone for compose drops
215-
&compose_config,
216-
store_for_compose,
217-
&input_text,
218-
&c,
219-
);
182+
let (output, filter_name, ctype, rewind_hash, kept_count, dropped_count) = if let Some(filter) =
183+
matched_toml
184+
{
185+
let out = filter.apply(&input_text);
186+
(out, filter.name.clone(), ContentType::Unknown, None, 0, 0)
187+
} else {
188+
let c = classifier::classify(&input_text, command_name);
220189

221-
(out, format!("{:?}", c), c, r_hash, k_count, d_count)
222-
};
190+
let collapse_result = collapse::collapse(&input_text, &c);
191+
let effective_input = collapse_result.collapsed_lines.join("\n");
192+
193+
let active_session_opt = session.as_ref().and_then(|m| m.lock().ok());
194+
let scored_segments =
195+
scorer::score_segments(&effective_input, &c, active_session_opt.as_deref());
196+
197+
let distiller = crate::distillers::get_distiller(&c);
198+
let mut out =
199+
distiller.distill(&scored_segments, &input_text, active_session_opt.as_deref());
200+
201+
let compose_config = composer::ComposeConfig::default();
202+
let decision = composer::decide_rewind(&scored_segments, &c);
203+
204+
let k_count = scored_segments
205+
.iter()
206+
.filter(|s| s.final_score() >= compose_config.threshold)
207+
.count();
208+
let d_count = scored_segments.len() - k_count;
209+
210+
crate::pipeline::composer::evaluate_learning(
211+
&c,
212+
&input_text,
213+
scored_segments.len(),
214+
d_count,
215+
command_name.unwrap_or(""),
216+
);
217+
218+
let mut r_hash = None;
219+
if decision.should_store
220+
&& let Some(s) = store
221+
{
222+
let hash = s.store_rewind(&input_text);
223+
out.push_str(&format!(
224+
"\n{} {} {} {} lines. The hash {} stores the full output in RewindStore for retrieval.\n",
225+
"⏺".cyan(),
226+
"OMNI".bold().bright_white(),
227+
"distilled".bright_green(),
228+
d_count,
229+
hash.cyan().bold()
230+
));
231+
r_hash = Some(hash);
232+
}
233+
234+
if out.len() > compose_config.max_output_chars {
235+
out.truncate(compose_config.max_output_chars);
236+
out.push_str("\n[OMNI: output truncated]\n");
237+
}
238+
239+
(out, format!("{:?}", c), c, r_hash, k_count, d_count)
240+
};
223241

224242
PipelineResult {
225243
session_id,

0 commit comments

Comments
 (0)