Implement command-aware content classification heuristics#51
Merged
Conversation
…mproved pipeline accuracy
…accuracy and CLI diagnostic support
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Auto Describe
Summary
This PR delivers a major overhaul of omni's content processing pipeline, including command-aware content classification, historical data repair via
omni doctor, pipeline refactoring, and CLI tooling improvements to boost detection accuracy and user experience.Key Changes
classifyfunction to use command context for far more accurate content type detection, with built-in heuristics for git, system, build, web, and cloud commands.omni doctor --fixto reclassify outdated, mislabeled historical distillation records in the local SQLite store.composefunction with dedicated distillation and auto-learn evaluation code, added output truncation safety.omni statswith better filtering and clearer messaging, updatedomni learnto include command context in generated filters, reduced update check cache TTL to 4 hours.Detailed Breakdown
Core Classification System
Rewrote
pipeline/classifier.rsto add an optionalcommand_nameparameter toclassify(), with pre-stage heuristics that extract the base command (e.g.gitfromgit status) and route outputs to correct content types: git subcommands, system utilities, build tools, web/JS tools, and cloud tools. Updated all existing tests to use the new API, added new test cases for command-specific classification, and updated all benchmarks/tests to passNoneas context where unavailable.Pipeline & Hook Logic
Refactored
hooks/pipe.rsandhooks/post_tool.rsto pass command context toclassify(), replaced legacycomposecalls with direct distiller usage, added auto-learn evaluation viaevaluate_learning(), and added output truncation safety checks. Splitcomposer.rs'scomposefunction into a dedicatedevaluate_learning()function for auto-learn triggers, removed the unusedcoloredimport, and deleted the legacycomposefunction entirely. Added properly formatted rewind store notifications in both hook files.Data Store & Doctor Command
Added
reclassify_historical_data()andhas_upgradable_history()tosqlite.rsto update old misclassified distillation records using their stored command context. Updatedcli/doctor.rsto add an Intelligence Consistency check section: prompts users to run--fixto upgrade historical stats, or runs repairs automatically when the flag is present. Updatedstats.rsto show a prompt to runomni doctor --fixwhen upgradable history exists, added filter flag support (--today/--week/--month/--all-commands) that defaults to detail mode, improved messaging for hidden zero-savings commands, and cleaned up default stats labeling.CLI Learn & MCP Server
Updated
cli/learn.rsto pass command context togenerate_toml()andapply_to_config(), so auto-generated filters include the triggering command as amatch_commandrule and better descriptions. Updatedmcp/server.rsto use the newclassify()API and pass None context, updated filter generation calls to match the new signature.Minor Updates
Reduced the update check cache TTL in
guard/update.rsfrom 24 hours to 4 hours. Updated all test files (savings_assertions.rs,security_tests.rs) to use the newclassify()API and replace legacycomposecalls with direct distiller usage. Updated all benchmark calls inbenches/pipeline.rsto pass the new command context parameter.Notes
Breaking Changes
classifier::classifyAPI Break: The function now requires an optionalcommand_nameparameter; all existing calls must be updated to passNoneif no command context is available (e.g.classifier::classify(input, None)).composer::composeFunction: Direct calls to this legacy function will fail; use the distiller directly alongsideevaluate_learning()for auto-learn logic.generate_tomlandapply_to_configSignatures: These functions now accept an optionalcommandparameter; callers must add this argument (passNoneif unused).