Skip to content

Reduce noisy false positives #31

@luigi-agosti

Description

@luigi-agosti

Build a “NotInject-style” test suite
Add a built-in test dataset (or command idpishield test-overdefense) that runs your patterns against benign sentences full of trigger words. This would instantly surface false-positive regressions.

Add a lightweight “trigger-word debias” layer
Before running full patterns, apply a simple MOF-inspired filter:

  • Count trigger words.
  • If they appear in an otherwise normal context (no other suspicious patterns, high entropy, etc.), lower the risk score automatically. (This can be pure Go regex + heuristics — zero ML needed.)

Hybrid scoring with over-defense penalty
Extend RiskResult to include an OverDefenseRisk float.
When combining pattern score + optional ONNX PromptGuard 2 (as we discussed earlier), apply a small bonus/penalty based on PIGuard-style logic.

Expose a “benign-trigger” mode
New config flag: DebiasTriggers: true that users can enable in production to drastically cut false positives.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions