feat: add negative_context support to reduce false positives in context-aware PII detection by TheSabari07 · Pull Request #1969 · microsoft/presidio

TheSabari07 · 2026-04-12T14:33:42Z

Change Description

Added support for negative_context in context-aware PII detection to reduce false positives.

Added support for negative_context in context-aware PII detection
Applied score penalty when negative context words appear near detected entities
Updated context enhancer to handle both positive and negative signals independently
Ensured compatibility with predefined recognizers by filtering unsupported arguments
Added tests to validate behavior, edge cases, and backward compatibility

Issue reference

Fixes #1686

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

TheSabari07 · 2026-04-12T14:35:44Z

Hi @omri374

I worked on this as part of the discussion in #1686.

This PR focuses specifically on adding negative context support to reduce false positives in rule-based detection. Would appreciate your thoughts on the approach

Also, I had a couple of quick questions:

Can I include the additional test file I used to verify the negative context behavior?
Is it okay if I start experimenting with this in the presidio-research repo to further validate the approach?

omri374 · 2026-04-13T14:20:07Z

                f"Got: {context_matching_mode}"
            )
        self.context_matching_mode = context_matching_mode
+        self.negative_context_penalty = negative_context_penalty


Can we add this to the parent ContextAwareEnhancer? it could serve other context enhancers too

Okay @omri374

that makes sense.

I’ll move negative_context_penalty to the base ContextAwareEnhancer so it can be reused across other context enhancers as well.

omri374

This is a great start. Please add tests + update a recognizer to include negative context.

Copilot

Pull request overview

Adds negative_context support to Presidio Analyzer’s context-aware scoring to reduce false positives by penalizing matches when “negative” keywords appear near an entity.

Changes:

Extend recognizer configuration loading to read and pass negative_context (including per-language config handling).
Add negative_context plumbing to EntityRecognizer/PatternRecognizer (init + serialization).
Update LemmaContextAwareEnhancer to apply a configurable score penalty when negative context words appear near detected entities.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
presidio-analyzer/presidio_analyzer/recognizer_registry/recognizers_loader_utils.py	Loads `negative_context` from config and filters it when recognizers don’t accept the argument.
presidio-analyzer/presidio_analyzer/pattern_recognizer.py	Adds `negative_context` to PatternRecognizer init and (de)serialization.
presidio-analyzer/presidio_analyzer/entity_recognizer.py	Adds `negative_context` to the base recognizer API and stores it on the instance.
presidio-analyzer/presidio_analyzer/context_aware_enhancers/lemma_context_aware_enhancer.py	Implements negative-context score penalty logic in the default context enhancer.

Copilot · 2026-04-13T14:37:24Z

+                if negative_context_word != "":
+                    result.score -= self.negative_context_penalty
+                    result.score = max(result.score, ContextAwareEnhancer.MIN_SCORE)
+                    logger.debug("Applied negative context penalty for word '%s'", negative_context_word)


After applying the negative_context penalty, the RecognizerResult.analysis_explanation isn't updated (set_improved_score is only called after the positive boost). This makes the explainability fields (score/score_context_improvement) inconsistent with the final result.score; update the AnalysisExplanation to reflect the post-penalty score as well.

Suggested change

logger.debug("Applied negative context penalty for word '%s'", negative_context_word)

result.analysis_explanation.set_improved_score(result.score)

logger.debug(

"Applied negative context penalty for word '%s'",

negative_context_word,

)

@TheSabari07 please make sure you update the analysis explanation fields as well.

Sure @omri374

Copilot · 2026-04-13T14:37:24Z

                    ),
+                    "negative_context": RecognizerListLoader._get_recognizer_negative_context(
+                        recognizer=recognizer_conf
+                    ),


The added negative_context extraction call is on lines that exceed the configured Ruff line-length (88) (e.g., the 'negative_context' assignment in the returned dict). Please wrap these function calls to avoid E501 lint failures.

Copilot · 2026-04-13T14:37:25Z

+            # Apply negative context penalty if recognizer has negative_context defined
+            if recognizer.negative_context:
+                negative_context_word = self._find_supportive_word_in_context(
+                    surrounding_words, recognizer.negative_context, self.context_matching_mode
+                )
+                if negative_context_word != "":
+                    result.score -= self.negative_context_penalty
+                    result.score = max(result.score, ContextAwareEnhancer.MIN_SCORE)
+                    logger.debug("Applied negative context penalty for word '%s'", negative_context_word)


negative_context introduces new scoring behavior (penalty application and clamping) but there are currently no unit tests covering negative_context in the test suite (no references found under presidio-analyzer/tests). Please add tests to validate: (1) penalty is applied when negative context appears in the window, (2) score is clamped at 0, (3) interaction with positive context (boost then penalty), and (4) backward compatibility when negative_context is unset.

TheSabari07 · 2026-04-14T08:39:19Z

This is a great start. Please add tests + update a recognizer to include negative context.

Thank you @omri374

I’ll add unit tests to cover negative_context (penalty, edge cases, and backward compatibility) and also update an existing recognizer to explicitly include negative_context for validation.

…xt enhancer

TheSabari07 · 2026-04-19T17:02:12Z

Hi @omri374

I’ve made the changes suggested in the review:

Updated negative_context handling to behave consistently with positive context (as discussed)
Moved negative_context support into EntityRecognizer level for cleaner design
Removed unnecessary filtering logic in recognizers_loader_utils.py as suggested
Ensured YAML/simple language configs don’t incorrectly extract context/negative_context
Aligned enhancer logic to avoid double boosting and correctly apply negative penalty independently
Updated tests to fully cover these changes and ensure backward compatibility

All tests are passing locally, and I’ve verified no regressions in existing analyzer tests.

Please verify and let me know if any changes are needed

TheSabari07 · 2026-04-25T09:52:52Z

Hi @omri374,
just a quick check on this PR. Happy to make any changes if needed.

omri374 · 2026-04-25T13:16:06Z

Apologies, will review shortly.

TheSabari07 · 2026-04-25T16:40:06Z

Apologies, will review shortly.

No issues @omri374, thanks for the update

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (2)

presidio-analyzer/presidio_analyzer/analyzer_engine.py:165

negative_context is added to AnalyzerEngine.analyze, but the analyzer service (/analyze) builds its arguments from AnalyzerRequest (app.py), which currently doesn't parse/forward a negative_context field. As-is, REST clients can't use this feature; either wire it through the request object/endpoint or clarify that it's Python-only.

    def analyze(
        self,
        text: str,
        language: str,
        entities: Optional[List[str]] = None,
        correlation_id: Optional[str] = None,
        score_threshold: Optional[float] = None,
        return_decision_process: Optional[bool] = False,
        ad_hoc_recognizers: Optional[List[EntityRecognizer]] = None,
        context: Optional[List[str]] = None,
        negative_context: Optional[List[str]] = None,
        allow_list: Optional[List[str]] = None,
        allow_list_match: Optional[str] = "exact",
        regex_flags: Optional[int] = re.DOTALL | re.MULTILINE | re.IGNORECASE,
        nlp_artifacts: Optional[NlpArtifacts] = None,
    ) -> List[RecognizerResult]:

presidio-analyzer/presidio_analyzer/analyzer_engine.py:165

There’s no unit test exercising the new AnalyzerEngine.analyze(..., negative_context=...) parameter end-to-end (engine -> context enhancer). Adding a focused test would guard the public API behavior and ensure request-level negative context is applied correctly.

    def analyze(
        self,
        text: str,
        language: str,
        entities: Optional[List[str]] = None,
        correlation_id: Optional[str] = None,
        score_threshold: Optional[float] = None,
        return_decision_process: Optional[bool] = False,
        ad_hoc_recognizers: Optional[List[EntityRecognizer]] = None,
        context: Optional[List[str]] = None,
        negative_context: Optional[List[str]] = None,
        allow_list: Optional[List[str]] = None,
        allow_list_match: Optional[str] = "exact",
        regex_flags: Optional[int] = re.DOTALL | re.MULTILINE | re.IGNORECASE,
        nlp_artifacts: Optional[NlpArtifacts] = None,
    ) -> List[RecognizerResult]:

Copilot · 2026-04-26T20:32:37Z

+    nlp_artifacts = spacy_nlp_engine.process_text(text, "en")
+
+    results = recognizer.analyze(text, nlp_artifacts)
+