Skip to content

Implement command-aware content classification heuristics#51

Merged
fajarhide merged 3 commits intomainfrom
bug-fix/v0.5.5-rc1
Apr 8, 2026
Merged

Implement command-aware content classification heuristics#51
fajarhide merged 3 commits intomainfrom
bug-fix/v0.5.5-rc1

Conversation

@fajarhide
Copy link
Copy Markdown
Owner

@fajarhide fajarhide commented Apr 8, 2026

PR Auto Describe

Summary

This PR delivers a major overhaul of omni's content processing pipeline, including command-aware content classification, historical data repair via omni doctor, pipeline refactoring, and CLI tooling improvements to boost detection accuracy and user experience.


Key Changes

  • Command-Aware Content Classification: Rewrote the core classify function to use command context for far more accurate content type detection, with built-in heuristics for git, system, build, web, and cloud commands.
  • Historical Stats Repair: Added omni doctor --fix to reclassify outdated, mislabeled historical distillation records in the local SQLite store.
  • Pipeline Refactor: Simplified hook and pipeline logic, replaced the legacy compose function with dedicated distillation and auto-learn evaluation code, added output truncation safety.
  • CLI & Tooling Updates: Refined omni stats with better filtering and clearer messaging, updated omni learn to include command context in generated filters, reduced update check cache TTL to 4 hours.

Detailed Breakdown

Core Classification System

Rewrote pipeline/classifier.rs to add an optional command_name parameter to classify(), with pre-stage heuristics that extract the base command (e.g. git from git status) and route outputs to correct content types: git subcommands, system utilities, build tools, web/JS tools, and cloud tools. Updated all existing tests to use the new API, added new test cases for command-specific classification, and updated all benchmarks/tests to pass None as context where unavailable.

Pipeline & Hook Logic

Refactored hooks/pipe.rs and hooks/post_tool.rs to pass command context to classify(), replaced legacy compose calls with direct distiller usage, added auto-learn evaluation via evaluate_learning(), and added output truncation safety checks. Split composer.rs's compose function into a dedicated evaluate_learning() function for auto-learn triggers, removed the unused colored import, and deleted the legacy compose function entirely. Added properly formatted rewind store notifications in both hook files.

Data Store & Doctor Command

Added reclassify_historical_data() and has_upgradable_history() to sqlite.rs to update old misclassified distillation records using their stored command context. Updated cli/doctor.rs to add an Intelligence Consistency check section: prompts users to run --fix to upgrade historical stats, or runs repairs automatically when the flag is present. Updated stats.rs to show a prompt to run omni doctor --fix when upgradable history exists, added filter flag support (--today/--week/--month/--all-commands) that defaults to detail mode, improved messaging for hidden zero-savings commands, and cleaned up default stats labeling.

CLI Learn & MCP Server

Updated cli/learn.rs to pass command context to generate_toml() and apply_to_config(), so auto-generated filters include the triggering command as a match_command rule and better descriptions. Updated mcp/server.rs to use the new classify() API and pass None context, updated filter generation calls to match the new signature.

Minor Updates

Reduced the update check cache TTL in guard/update.rs from 24 hours to 4 hours. Updated all test files (savings_assertions.rs, security_tests.rs) to use the new classify() API and replace legacy compose calls with direct distiller usage. Updated all benchmark calls in benches/pipeline.rs to pass the new command context parameter.


Notes

  • The command-aware classification drastically improves accuracy for tool-specific output (e.g. git diffs, cargo build logs, kubectl outputs) by using the original executed command as context.
  • The historical reclassification fixes long-standing issues with mislabeled old stats records.

Breaking Changes

  1. classifier::classify API Break: The function now requires an optional command_name parameter; all existing calls must be updated to pass None if no command context is available (e.g. classifier::classify(input, None)).
  2. Removed composer::compose Function: Direct calls to this legacy function will fail; use the distiller directly alongside evaluate_learning() for auto-learn logic.
  3. Updated generate_toml and apply_to_config Signatures: These functions now accept an optional command parameter; callers must add this argument (pass None if unused).

@fajarhide fajarhide merged commit cf6e990 into main Apr 8, 2026
4 checks passed
@fajarhide fajarhide deleted the bug-fix/v0.5.5-rc1 branch May 5, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant