Implement command-aware content classification heuristics by fajarhide · Pull Request #51 · fajarhide/omni

fajarhide · 2026-04-08T08:37:33Z

PR Auto Describe

Summary

This PR delivers a major overhaul of omni's content processing pipeline, including command-aware content classification, historical data repair via omni doctor, pipeline refactoring, and CLI tooling improvements to boost detection accuracy and user experience.

Key Changes

Command-Aware Content Classification: Rewrote the core classify function to use command context for far more accurate content type detection, with built-in heuristics for git, system, build, web, and cloud commands.
Historical Stats Repair: Added omni doctor --fix to reclassify outdated, mislabeled historical distillation records in the local SQLite store.
Pipeline Refactor: Simplified hook and pipeline logic, replaced the legacy compose function with dedicated distillation and auto-learn evaluation code, added output truncation safety.
CLI & Tooling Updates: Refined omni stats with better filtering and clearer messaging, updated omni learn to include command context in generated filters, reduced update check cache TTL to 4 hours.

Detailed Breakdown

Core Classification System

Rewrote pipeline/classifier.rs to add an optional command_name parameter to classify(), with pre-stage heuristics that extract the base command (e.g. git from git status) and route outputs to correct content types: git subcommands, system utilities, build tools, web/JS tools, and cloud tools. Updated all existing tests to use the new API, added new test cases for command-specific classification, and updated all benchmarks/tests to pass None as context where unavailable.

Pipeline & Hook Logic

Refactored hooks/pipe.rs and hooks/post_tool.rs to pass command context to classify(), replaced legacy compose calls with direct distiller usage, added auto-learn evaluation via evaluate_learning(), and added output truncation safety checks. Split composer.rs's compose function into a dedicated evaluate_learning() function for auto-learn triggers, removed the unused colored import, and deleted the legacy compose function entirely. Added properly formatted rewind store notifications in both hook files.

Data Store & Doctor Command

Added reclassify_historical_data() and has_upgradable_history() to sqlite.rs to update old misclassified distillation records using their stored command context. Updated cli/doctor.rs to add an Intelligence Consistency check section: prompts users to run --fix to upgrade historical stats, or runs repairs automatically when the flag is present. Updated stats.rs to show a prompt to run omni doctor --fix when upgradable history exists, added filter flag support (--today/--week/--month/--all-commands) that defaults to detail mode, improved messaging for hidden zero-savings commands, and cleaned up default stats labeling.

CLI Learn & MCP Server

Updated cli/learn.rs to pass command context to generate_toml() and apply_to_config(), so auto-generated filters include the triggering command as a match_command rule and better descriptions. Updated mcp/server.rs to use the new classify() API and pass None context, updated filter generation calls to match the new signature.

Minor Updates

Reduced the update check cache TTL in guard/update.rs from 24 hours to 4 hours. Updated all test files (savings_assertions.rs, security_tests.rs) to use the new classify() API and replace legacy compose calls with direct distiller usage. Updated all benchmark calls in benches/pipeline.rs to pass the new command context parameter.

Notes

The command-aware classification drastically improves accuracy for tool-specific output (e.g. git diffs, cargo build logs, kubectl outputs) by using the original executed command as context.
The historical reclassification fixes long-standing issues with mislabeled old stats records.

Breaking Changes

classifier::classify API Break: The function now requires an optional command_name parameter; all existing calls must be updated to pass None if no command context is available (e.g. classifier::classify(input, None)).
Removed composer::compose Function: Direct calls to this legacy function will fail; use the distiller directly alongside evaluate_learning() for auto-learn logic.
Updated generate_toml and apply_to_config Signatures: These functions now accept an optional command parameter; callers must add this argument (pass None if unused).

…mproved pipeline accuracy

…accuracy and CLI diagnostic support

…n stats output

fajarhide added 3 commits April 8, 2026 15:37

feat: implement command-aware content classification heuristics for i…

0f1805b

…mproved pipeline accuracy

feat: add historical data reclassification for improved content type …

4ad33c9

…accuracy and CLI diagnostic support

feat: reduce update check cache to 4 hours and display notification i…

5fde93d

…n stats output

fajarhide merged commit cf6e990 into main Apr 8, 2026
4 checks passed

fajarhide deleted the bug-fix/v0.5.5-rc1 branch May 5, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement command-aware content classification heuristics#51

Implement command-aware content classification heuristics#51
fajarhide merged 3 commits intomainfrom
bug-fix/v0.5.5-rc1

fajarhide commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fajarhide commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Auto Describe

Summary

Key Changes

Detailed Breakdown

Core Classification System

Pipeline & Hook Logic

Data Store & Doctor Command

CLI Learn & MCP Server

Minor Updates

Notes

Breaking Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fajarhide commented Apr 8, 2026 •

edited

Loading