Masking Function Causes Major Latency #10550

hananmannhelloheart · 2025-11-18T13:14:40Z

hananmannhelloheart
Nov 18, 2025

We're tracing our LangChain/LangGraph application using the Langfuse callback handler. We've implemented a Presidio-based masking function and are passing it to the Langfuse client to redact sensitive data before logging.

The Issue: The synchronous execution of the data masking function is causing significant performance degradation.

The Result: Our overall response time is doubling, severely impacting user experience.

We need to execute this data processing off the main thread to maintain high performance. Could you please advise on the official or recommended pattern for integrating a custom, computationally intensive masking function with the Langfuse client so that it executes non-blockingly?

Python 3.12.12
langchain 1.0.3
langgraph 1.0.2
langfuse 3.10.0

2025-11-18T13:15:30Z

inkeep-ai-assistant[bot]
bot Nov 18, 2025

Good news—the masking function already executes in a background thread and should not add latency to your application execution(1).

According to the documentation, the mask is applied in the background thread and does not add latency to the execution of the application(2). This means your Presidio-based masking function should run asynchronously without blocking your main application flow.

However, if you're still experiencing performance issues, here are some things to verify:

1. Ensure you're using the correct masking setup:

When initializing the Langfuse client with your masking function, use this pattern(3):

from langfuse import Langfuse

def pii_masker(data: any, **kwargs) -> any:
    # Your Presidio-based masking logic here
    if isinstance(data, str):
        # Apply masking
        return masked_data
    elif isinstance(data, dict):
        return {k: pii_masker(data=v) for k, v in data.items()}
    elif isinstance(data, list):
        return [pii_masker(data=item) for item in data]
    return data

langfuse = Langfuse(mask=pii_masker)

(3)

2. Pass the masking function to the LangChain CallbackHandler:

from langfuse.langchain import CallbackHandler

handler = CallbackHandler(mask=pii_masker)

(4)

3. Optimize your masking function:

Since the masking function is called for each piece of data (inputs, outputs, metadata), consider optimizing your Presidio calls. The function should be as efficient as possible even though it runs in the background(3).

If you're still experiencing doubled response times despite the background execution, this suggests the issue might be elsewhere in your setup. Double-check that you're not accidentally blocking on the masking operation or that your Presidio configuration isn't causing unexpected synchronous behavior.

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

jannikmaierhoefer · 2025-11-19T09:19:03Z

jannikmaierhoefer
Nov 19, 2025
Maintainer

Hi @hananmannhelloheart, inkeep is correct, the masking function should not add latency. Could you share a minimal viable code snippet to reproduce this on our end?

2 replies

hananmannhelloheart Nov 19, 2025
Author

Hi @jannikmaierhoefer

When we remove _anonymize_message from the flow, the latency drops noticeably.
I’m adding our PresidioPIIMasker implementation and the relevant code used to run it, along with the configuration required for Presidio.

def _anonymize_message(message: dict[str, str] | str, pii_masker: PresidioPIIMasker) -> dict[str, str]:
    """
    Anonymize the message using the specified PII masker.
    Args:
        message (dict[str, str] | str): The message to anonymize. This can be a dictionary or a string.
        pii_masker (BasePIIMasker): The PII masker to use for anonymization.

    Returns:
        dict[str, str]: The anonymized message.
    """
    if isinstance(message, str):
        return pii_masker.mask(message)  # If the message is a string, mask it directly
    if not isinstance(message, dict):
        return message

    anoonymized_message = deepcopy(
        message
    )  # Create a copy of the message to avoid modifying the original
    for key, value in message.items():
        if key == "role":
            continue

        selected_strategy = STRATEGY_MAP.get(type(value), DEFAULT_STRATEGY)
        try:
            anoonymized_message[key] = selected_strategy.mask(
                value, pii_masker
            )  # Apply the selected strategy to mask the value

        except Exception:
            logger.debug(
                "Error during PII masking for key '%s'. Setting to default value",
                key,
            )
            anoonymized_message[key] = (
                "[REDACTION_ERROR]"  # Default value to indicate an error and avoid the risk of exposing PII
            )
    return anoonymized_message

    class PresidioPIIMasker:
    """
    Presidio PII Masker class that extends the BasePIIMasker.
    This class is responsible for masking PII using the Presidio library.

    Notes:
    This class is implemented as a singleton to reduce the overhead of creating the NLP engine and the anonymizer engine multiple times
    which is time consuming.
    """

    _instance = None
    _initialized = False
    _lock = threading.Lock()  # Lock for thread-safe initialization

    def __new__(cls, *args, **kwargs):
        if cls._instance is None:
            with cls._lock:
                if cls._instance is None:
                    cls._instance = super(PresidioPIIMasker, cls).__new__(cls)
        return cls._instance

    def __init__(
        self,
        nlp_engine_name: str,
        lang_code: str,
        model_name: str,
        operators_config: dict[str, dict],
    ):
        """Initialize the Presidio PII Masker.

        Args:
            provider (NlpEngineProvider): The NLP engine provider.
            anonymizer (AnonymizerEngine): The anonymizer engine.
            analyzer (AnalyzerEngine): The analyzer engine.
            operators_config (dict[str, dict]): The configuration for the anonymizer operators as defined in OperatorConfig.from_json

        Notes:
        This class uses the Presidio library to mask PII in text so we initialize
        the Presidio objects here and not use dependency injection as we won't replace
        them with other implementations.
        """
        if PresidioPIIMasker._initialized:
            return

        with PresidioPIIMasker._lock:
            # Second check (inside the lock)
            if PresidioPIIMasker._initialized:  # Double-check inside the lock
                return
        logging.getLogger("presidio-analyzer").disabled = True

        self.lang_code = lang_code
        self.nlp_engine_name = NlpEngineProvider(
            nlp_configuration={
                "nlp_engine_name": nlp_engine_name,
                "models": [{"lang_code": lang_code, "model_name": model_name}],
            }
        ).create_engine()
        self.analyzer = AnalyzerEngine(
            nlp_engine=self.nlp_engine_name,
            supported_languages=[lang_code],
        )
        self.anonymizer = AnonymizerEngine()
        self.entities = [
            entity for entity in operators_config.keys() if entity != "DEFAULT"
        ]  # Default operator is not an entity
        self.operators_config = {
            k: OperatorConfig.from_json(v) for k, v in operators_config.items()
        }

        if (
            "DEFAULT" not in self.operators_config
        ):  # Hard code the default operator to "keep" if not provided
            self.operators_config["DEFAULT"] = OperatorConfig("keep", {})
        PresidioPIIMasker._initialized = True

    def mask(self, text: str) -> str:
        """Mask the PII in the given text using the Presidio library pipeline.
        This method first analyzes the text to identify PII entities and then
        applies the anonymization operators to mask the identified entities.
        The method handles exceptions and logs any errors that occur during
        the masking process. If an error occurs, the original text is returned.

        Args:
            text (str): The text to be masked.

        Returns:
            str: The masked text.
        """
        # Stop logs from the Presidio library as they are too verbose
        previous_level = logging.root.manager.disable  # Save current level
        logging.disable(logging.CRITICAL)  # Disable all logging
        try:
            text_str = str(text)  # Ensure text is a string or cast it to string
            analyzer_results = self.analyzer.analyze(
                text=text_str, language=self.lang_code, entities=self.entities
            )
            # Anonymize the text using the specified operators
            anonymized_text = self.anonymizer.anonymize(
                text=text_str, analyzer_results=analyzer_results, operators=self.operators_config
            )
            return anonymized_text.text
        except Exception:
            logger.debug("Error during PII masking. Returning originl string")
            return text
        finally:
            logging.disable(previous_level)  # Re-enable logging

class ValueMaskingStrategy(ABC):
    """
    Abstract base class for different value masking strategies.
    """

    @abstractmethod
    def mask(self, value: Any, pii_masker: BasePIIMasker) -> Any:
        """
        Masks the given value using the provided pii_masker.
        Args:
            value: The value to mask.
            pii_masker: The core PII masker instance.
        Returns:
            The masked value.
        """


class StringMaskingStrategy(ValueMaskingStrategy):
    """Handles string values."""

    def mask(self, value: str, pii_masker: BasePIIMasker) -> str:
        return pii_masker.mask(value)


class ListMaskingStrategy(ValueMaskingStrategy):
    """Handles list values."""

    def mask(self, value: list, pii_masker: BasePIIMasker) -> list[str]:
        # This strategy assumes items in the list are maskable as strings.
        # If list items can be complex types themselves, this might need further enhancement
        # or to delegate to a strategy selector for each item.
        return [pii_masker.mask(str(item)) for item in value]


class DictMaskingStrategy(ValueMaskingStrategy):
    """Handles dictionary values."""

    def mask(self, value: dict, pii_masker: BasePIIMasker) -> str:
        # This strategy masks the JSON string representation of the dictionary.
        # If you need to mask individual values within the dictionary recursively
        # while preserving the dict structure, this strategy would need to be more complex,
        # potentially iterating through dict values and applying strategies to them.
        return pii_masker.mask(json.dumps(value))


class DefaultMaskingStrategy(ValueMaskingStrategy):
    """Handles any other types by converting to string."""

    def mask(self, value: Any, pii_masker: BasePIIMasker) -> str:
        return pii_masker.mask(str(value))

nlp_engine_name: spacy
lang_code: en
model_name: en_core_web_md
operators_config:
  DEFAULT: # default operator for all PII entities not specified in the config
    type: keep
  PERSON:
    type: replace
    new_value: "[NAME]"
  EMAIL_ADDRESS:
    type: replace
    new_value: "[EMAIL]"
  PHONE_NUMBER:
    type: replace
    new_value: "[PHONE_NUMBER]"

hananmannhelloheart Nov 23, 2025
Author

@jannikmaierhoefer
We also deployed Presidio Analyzer and Presidio Anonymizer as independent services, transforming the masking process into a lightweight API call sequence. However, we’re still observing the same latency as when using the Presidio SDK.

import logging

import requests
from omegaconf import DictConfig, OmegaConf

logger = logging.getLogger(__name__)


class PresidioRemotePIIMasker:
    """Remote Presidio PII masker that uses external analyzer and anonymizer services."""

    def __init__(
        self,
        analyzer_url: str,
        anonymizer_url: str,
        anonymizers: dict[str, dict] | DictConfig | None = None,
    ):
        """Initialize the Presidio remote masker.

        Args:
            analyzer_url: URL of the Presidio analyzer service.
            anonymizer_url: URL of the Presidio anonymizer service.
            anonymizers: Anonymizers configuration in Presidio API format. Can be a dict or
                OmegaConf DictConfig. Example:
                {
                    "DEFAULT": {"type": "keep"},
                    "PERSON": {"type": "replace", "new_value": "[NAME]"},
                }
        """
        self.analyzer_url = analyzer_url.rstrip("/")
        self.anonymizer_url = anonymizer_url.rstrip("/")
        # Convert DictConfig to regular dict if needed
        if anonymizers is None:
            self.anonymizers: dict[str, dict] = {}
        elif isinstance(anonymizers, DictConfig):
            anonymizers_dict = OmegaConf.to_container(anonymizers, resolve=True)
            # Ensure it's a dict (to_container can return dict or list)
            if isinstance(anonymizers_dict, dict):
                self.anonymizers = dict(anonymizers_dict)  # type: ignore[assignment]
            else:
                logger.warning("Anonymizers config is not a dict, using empty dict")
                self.anonymizers = {}
        else:
            self.anonymizers = anonymizers
        logger.info(
            "Initialized PresidioRemotePIIMasker with analyzer=%s, anonymizer=%s",
            self.analyzer_url,
            self.anonymizer_url,
        )

    def mask(self, text: str) -> str:
        """Mask PII in the given text using remote Presidio services.

        Args:
            text: The text to mask.

        Returns:
            The masked text, or original text if masking fails.
        """
        try:

            logger.debug("Sending analyze request to %s/analyze", self.analyzer_url)
            analyzer_res = requests.post(
                f"{self.analyzer_url}/analyze",
                json={"text": text, "language": "en"},
                timeout=10,
            )
            analyzer_res.raise_for_status()
            analyzer_results = analyzer_res.json()
            logger.debug("Analyzer response received: %d PII entities found", len(analyzer_results))

            logger.debug("Sending anonymize request to %s/anonymize", self.anonymizer_url)
            anonymize_payload = {
                "text": text,
                "analyzer_results": analyzer_results,
            }
            # Add anonymizers if configured (already in API format)
            if self.anonymizers:
                anonymize_payload["anonymizers"] = self.anonymizers
                logger.debug("Using anonymizers: %s", list(self.anonymizers.keys()))

            anonymizer_res = requests.post(
                f"{self.anonymizer_url}/anonymize",
                json=anonymize_payload,
                timeout=10,
            )
            anonymizer_res.raise_for_status()

            data = anonymizer_res.json()
            masked_text = data.get("text", text)
            return masked_text

        except requests.exceptions.RequestException as e:
            logger.error(
                "Presidio remote masking failed due to request error: %s: %s",
                type(e).__name__,
                e,
            )
            return text
        except Exception as e:
            logger.error(
                "Presidio remote masking failed due to unexpected error: %s: %s",
                type(e).__name__,
                e,
                exc_info=True,
            )
            return text


_target_: common.pii_redaction.pii_maskers.presidio_remote.PresidioRemotePIIMasker
analyzer_url: ${oc.env:PRESIDIO_ANALYZER_URL}
anonymizer_url: ${oc.env:PRESIDIO_ANONYMIZER_URL}
# Anonymizers configuration in Presidio API format (ready to use directly)
anonymizers:
  DEFAULT: # default operator for all PII entities not specified in the config
    type: keep
  PERSON:
    type: replace
    new_value: "[NAME]"
  EMAIL_ADDRESS:
    type: replace
    new_value: "[EMAIL]"
  PHONE_NUMBER:
    type: replace
    new_value: "[PHONE_NUMBER]"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Masking Function Causes Major Latency #10550

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Masking Function Causes Major Latency #10550

Uh oh!

hananmannhelloheart Nov 18, 2025

Replies: 2 comments · 2 replies

Uh oh!

inkeep-ai-assistant[bot] bot Nov 18, 2025

Uh oh!

jannikmaierhoefer Nov 19, 2025 Maintainer

Uh oh!

hananmannhelloheart Nov 19, 2025 Author

Uh oh!

hananmannhelloheart Nov 23, 2025 Author

hananmannhelloheart
Nov 18, 2025

Replies: 2 comments 2 replies

inkeep-ai-assistant[bot]
bot Nov 18, 2025

jannikmaierhoefer
Nov 19, 2025
Maintainer

hananmannhelloheart Nov 19, 2025
Author

hananmannhelloheart Nov 23, 2025
Author