Build a “NotInject-style” test suite
Add a built-in test dataset (or command idpishield test-overdefense) that runs your patterns against benign sentences full of trigger words. This would instantly surface false-positive regressions.
Add a lightweight “trigger-word debias” layer
Before running full patterns, apply a simple MOF-inspired filter:
- Count trigger words.
- If they appear in an otherwise normal context (no other suspicious patterns, high entropy, etc.), lower the risk score automatically. (This can be pure Go regex + heuristics — zero ML needed.)
Hybrid scoring with over-defense penalty
Extend RiskResult to include an OverDefenseRisk float.
When combining pattern score + optional ONNX PromptGuard 2 (as we discussed earlier), apply a small bonus/penalty based on PIGuard-style logic.
Expose a “benign-trigger” mode
New config flag: DebiasTriggers: true that users can enable in production to drastically cut false positives.
Build a “NotInject-style” test suite
Add a built-in test dataset (or command idpishield test-overdefense) that runs your patterns against benign sentences full of trigger words. This would instantly surface false-positive regressions.
Add a lightweight “trigger-word debias” layer
Before running full patterns, apply a simple MOF-inspired filter:
Hybrid scoring with over-defense penalty
Extend RiskResult to include an OverDefenseRisk float.
When combining pattern score + optional ONNX PromptGuard 2 (as we discussed earlier), apply a small bonus/penalty based on PIGuard-style logic.
Expose a “benign-trigger” mode
New config flag: DebiasTriggers: true that users can enable in production to drastically cut false positives.