Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ Follow best practices in docs/analyzer/developing_recognizers.md:

2. **Make regex patterns specific** to minimize false positives
3. **Document pattern sources** with comments linking to standards/references
4. **Add to configuration** in `conf/default_recognizers.yaml` (set `enabled: false` for country-specific)
4. **Add to configuration** in `conf/analyzer.yaml` (set `enabled: false` for country-specific)
5. **Update imports** in `predefined_recognizers/__init__.py`
6. **Add comprehensive tests** including edge cases
7. **Update supported entities documentation** if adding new entity types
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ To contribute a new predefined recognizer to Presidio Analyzer:
- Document the source or reference for any new regex logic (e.g., link to a standard, documentation, or example dataset) in the code as a comment.

3. **Add your recognizer to the configuration:**
- Add your recognizer to `presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml`.
- Add your recognizer to `presidio-analyzer/presidio_analyzer/conf/analyzer.yaml`.
- For country-specific recognizers, set `enabled: false` by default in the YAML configuration.

3. **Update imports:** Add your recognizer to `presidio-analyzer/presidio_analyzer/predefined_recognizers/__init__.py` so it is available for import and backward compatibility.
Expand Down
4 changes: 2 additions & 2 deletions docs/analyzer/adding_recognizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ To add a recognizer to the list of pre-defined recognizers:

1. Clone the repo.
2. Create a file containing the new recognizer Python class.
3. Add the recognizer to the `recognizers` in the [`default_recognizers`](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml) config. Details of recognizer parameters are given [Here](./recognizer_registry_provider.md#the-recognizer-parameters).
3. Add the recognizer to the `recognizers` section in the [`analyzer.yaml`](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml) configuration file. Details of recognizer parameters are given [Here](./recognizer_registry_provider.md#the-recognizer-parameters).
4. Optional: Update documentation (e.g., the [supported entities list](../supported_entities.md)).

### Azure AI Language recognizer
Expand Down Expand Up @@ -231,7 +231,7 @@ Additional examples can be found in the [OpenAPI spec](../api-docs/api-docs.html
### Reading pattern recognizers from YAML

Recognizers can be loaded from a YAML file, which allows users to add recognition logic without writing code.
An example YAML file can be found [here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml).
An example YAML file can be found [here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml).

Once the YAML file is created, it can be loaded into the `RecognizerRegistry` instance.

Expand Down
76 changes: 25 additions & 51 deletions docs/analyzer/analyzer_engine_provider.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,17 @@
# Configuring the Analyzer Engine from file

Presidio uses `AnalyzerEngineProvider` to load `AnalyzerEngine` configuration from file.
Configuration can be loaded in three different ways:
Presidio uses `AnalyzerEngineProvider` to load `AnalyzerEngine` configuration from file.

## Using a single file
## Using the unified configuration file (recommended)

Create an `AnalyzerEngineProvider` using a single configuration file and set its path to `analyzer_engine_conf_file`, then create `AnalyzerEngine` based on it:
The recommended approach is to use a **single configuration file** (`analyzer.yaml`) that contains all settings: supported languages, NLP engine configuration, and recognizer registry.

```python
from presidio_analyzer import AnalyzerEngine, AnalyzerEngineProvider

analyzer_conf_file = "./analyzer/analyzer-config-all.yml"

provider = AnalyzerEngineProvider(
analyzer_engine_conf_file=analyzer_conf_file
)
analyzer_engine_conf_file="./analyzer-config.yaml"
)
analyzer = provider.create_engine()

results = analyzer.analyze(text="My name is Morris", language="en")
Expand Down Expand Up @@ -61,7 +58,6 @@ recognizer_registry:
- name: CreditCardRecognizer
supported_languages:
- en
supported_entity: IT_FISCAL_CODE
type: predefined

- name: ItFiscalCodeRecognizer
Expand All @@ -79,70 +75,48 @@ The configuration file contains the following parameters:

`supported_languages` must be identical to the same field in recognizer_registry

## Using multiple files

Create an `AnalyzerEngineProvider` using three different configuration files for each of the following components:
You can also load and validate the configuration using the Pydantic-based `AnalyzerConfiguration` model:

- Analyzer
- NLP Engine
- Recognizer Registry
```python
from presidio_analyzer.configuration import AnalyzerConfiguration

!!! note "Note"
config = AnalyzerConfiguration.from_yaml("./analyzer-config.yaml")
```

Each of these parameters is optional and in case it's not set, the default configuration will be used.
## Using the default configuration

Create an `AnalyzerEngineProvider` without any parameters. This will load the default configuration from the built-in [analyzer.yaml](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml):

```python
from presidio_analyzer import AnalyzerEngine, AnalyzerEngineProvider

analyzer_conf_file = "./analyzer/analyzer-config.yml"
nlp_engine_conf_file = "./analyzer/nlp-config.yml"
recognizer_registry_conf_file = "./analyzer/recognizers-config.yml"

provider = AnalyzerEngineProvider(
analyzer_engine_conf_file=analyzer_conf_file,
nlp_engine_conf_file=nlp_engine_conf_file,
recognizer_registry_conf_file=recognizer_registry_conf_file,
)
analyzer = provider.create_engine()
analyzer = AnalyzerEngineProvider().create_engine()

results = analyzer.analyze(text="My name is Morris", language="en")
print(results)
```

The structure of the configuration files is as follows:

- Analyzer engine configuration file:
## Using multiple files (deprecated)

```yaml
supported_languages:
- en
default_score_threshold: 0
```

- NLP engine configuration file structure is examined thoroughly in the [Customizing the NLP model](customizing_nlp_models.md) section.

- Recognizer registry configuration file structure is examined thoroughly in the [Customizing recognizer registry from file](recognizer_registry_provider.md) section.
!!! warning "Deprecated"

## Using the default configuration
Using separate configuration files for the NLP engine and recognizer registry is deprecated.
Use the unified configuration file instead – place the `nlp_configuration` and
`recognizer_registry` sections inside the analyzer configuration file.

Create an `AnalyzerEngineProvider` without any parameters. This will load the default configuration:
The `nlp_engine_conf_file` and `recognizer_registry_conf_file` parameters are still supported for backward compatibility, but will be removed in a future version.

```python
from presidio_analyzer import AnalyzerEngine, AnalyzerEngineProvider

provider = AnalyzerEngineProvider().create_engine()

results = provider.analyze(text="My name is Morris", language="en")
print(results)
provider = AnalyzerEngineProvider(
analyzer_engine_conf_file="./analyzer-config.yml",
nlp_engine_conf_file="./nlp-config.yml", # deprecated
recognizer_registry_conf_file="./recognizers.yml", # deprecated
)
analyzer = provider.create_engine()
```

The default configuration of `AnalyzerEngine` is defined in the following files:

- [Analyzer Engine](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_analyzer.yaml)
- [NLP Engine](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml)
- [Recognizer Registry](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml)

## Enabling and disabling recognizers
In general, recognizers that are not added to the configuration would not be created, with one exception.

Expand Down
4 changes: 2 additions & 2 deletions docs/analyzer/customizing_nlp_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Configuration can be done in two ways:
print(results_english)
```

- **Via configuration**: Set up the models which should be used in the [default `conf` file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml).
- **Via configuration**: Set up the models which should be used in the `nlp_configuration` section of the [unified analyzer configuration file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml).

An example Conf file:

Expand Down Expand Up @@ -86,7 +86,7 @@ Configuration can be done in two ways:
- `low_confidence_score_multiplier`: A multiplier to apply to the score of entities with low confidence.
- `low_score_entity_names`: A list of entity types to apply the low confidence score multiplier to.

The [default conf file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml) is read during the default initialization of the `AnalyzerEngine`. Alternatively, the path to a custom configuration file can be passed to the `NlpEngineProvider`:
To load NLP settings from a configuration file, use `NlpEngineProvider` with a config file such as the [analyzer configuration file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml). Alternatively, NLP configuration can be passed directly:

```python
from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
Expand Down
2 changes: 1 addition & 1 deletion docs/analyzer/languages.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Link to LANGUAGES_CONFIG_FILE=[languages-config.yml](https://github.com/microsof

When packaging the code into a Docker container, NLP models are automatically installed.
To define which models should be installed,
update the [conf/default.yaml](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml) file. This file is read during
update the `nlp_configuration` section in the [analyzer configuration file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml). This file is read during
the `docker build` phase and the models defined in it are installed automatically.

For `transformers` based models, the configuration [can be found here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/transformers.yaml).
Expand Down
18 changes: 7 additions & 11 deletions docs/samples/python/langextract/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ You have two options to set up Ollama:

**Option 1: Enable in configuration file**

Enable the recognizer in [`default_recognizers.yaml`](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml):
Enable the recognizer in [`analyzer.yaml`](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml):
```yaml
- name: BasicLangExtractRecognizer
enabled: true # Change from false to true
Expand All @@ -122,17 +122,13 @@ Enable the recognizer in [`default_recognizers.yaml`](https://github.com/microso
Then load the analyzer using this modified configuration file:

```python
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.recognizer_registry import RecognizerRegistryProvider
from presidio_analyzer import AnalyzerEngineProvider

# Point to your modified default_recognizers.yaml with Ollama enabled
provider = RecognizerRegistryProvider(
conf_file="/path/to/your/modified/default_recognizers.yaml"
# Point to your modified analyzer.yaml with the recognizer enabled
provider = AnalyzerEngineProvider(
analyzer_engine_conf_file="/path/to/your/modified/analyzer.yaml"
)
registry = provider.create_recognizer_registry()

# Create analyzer with the registry that includes Ollama recognizer
analyzer = AnalyzerEngine(registry=registry, supported_languages=["en"])
analyzer = provider.create_engine()

# Analyze text - Ollama recognizer will participate in detection
results = analyzer.analyze(text="My email is john.doe@example.com", language="en")
Expand All @@ -151,7 +147,7 @@ results = analyzer.analyze(text="My email is john.doe@example.com", language="en
```

!!! note "Note"
The recognizer is disabled by default in `default_recognizers.yaml` to avoid requiring Ollama for basic Presidio usage. Enable it when you have Ollama set up and running.
The recognizer is disabled by default in the configuration to avoid requiring Ollama for basic Presidio usage. Enable it when you have Ollama set up and running.

### Custom Configuration

Expand Down
19 changes: 9 additions & 10 deletions docs/samples/python/no_code_config.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,13 @@
"3. For team members interested in changing the configuration without writing code.\n",
"\n",
"In this example, we'll show how to create a no-code configuration in Presidio.\n",
"We start by creating YAML configuration files that are based on the default ones. \n",
"Te default configuration files for Presidio can be found here:\n",
"- [Analyzer configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_analyzer.yaml)\n",
"- [Recognizer registry configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml)\n",
"- [NLP engine configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml)\n",
"We start by creating a YAML configuration file based on the default one.\n",
"The default unified configuration file for Presidio Analyzer can be found here:\n",
"- [Unified analyzer configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml)\n",
"\n",
"Alternatively, one can create one configuration file for all three components.\n",
"In this example, we'll tweak the configuration to reduce the number of predefinedrecognizers to only a few, and add a new custom one. We'll also adjust the context words to support the detection of a different language (Spanish).\n"
"> **Note:** The previous approach of using three separate configuration files (`default_analyzer.yaml`, `default_recognizers.yaml`, `default.yaml`) is deprecated. Use the unified `analyzer.yaml` instead.\n",
"\n",
"In this example, we'll tweak the configuration to reduce the number of predefined recognizers to only a few, and add a new custom one. We'll also adjust the context words to support the detection of a different language (Spanish)."
]
},
{
Expand Down Expand Up @@ -61,7 +60,7 @@
"metadata": {},
"source": [
"### General Analyzer parameters\n",
"([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_analyzer.yaml))"
"([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml))"
]
},
{
Expand All @@ -85,7 +84,7 @@
"metadata": {},
"source": [
"### Recognizer Registry parameters\n",
"([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml))"
"([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml))"
]
},
{
Expand Down Expand Up @@ -173,7 +172,7 @@
"metadata": {},
"source": [
"### NLP Engine parameters\n",
"([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml))"
"([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml))"
]
},
{
Expand Down
20 changes: 11 additions & 9 deletions docs/tutorial/08_no_code.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,16 @@ No-code configuration can be helpful in three scenarios:
3. For team members interested in changing the configuration without writing code.

In this example, we'll show how to create a no-code configuration in Presidio.
We start by creating YAML configuration files that are based on the default ones.
The default configuration files for Presidio can be found here:
We start by creating a YAML configuration file based on the default one.
The default unified configuration file for Presidio Analyzer can be found here:

- [Analyzer configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_analyzer.yaml)
- [Recognizer registry configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml)
- [NLP engine configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml)
- [analyzer configuration](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml)

Alternatively, one can create one configuration file for all three components.
!!! warning "Deprecated separate files"

The previous approach of using three separate configuration files
(`default_analyzer.yaml`, `default_recognizers.yaml`, `default.yaml`)
is deprecated. Use the unified `analyzer.yaml` file instead.
In this example, we'll tweak the configuration to reduce the number of predefinedrecognizers to only a few, and add a new custom one. We'll also adjust the context words to support the detection of a different language (Spanish).

```python
Expand All @@ -31,7 +33,7 @@ In this example we're going to create the yaml as a string for illustration purp

### General Analyzer parameters

([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_analyzer.yaml))
([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml))

```python
analyzer_config_yaml = """
Expand All @@ -44,7 +46,7 @@ default_score_threshold: 0.4

### Recognizer Registry parameters

([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml))
([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml))

```python

Expand Down Expand Up @@ -121,7 +123,7 @@ recognizer_registry:

### NLP Engine parameters

([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml))
([default file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/analyzer.yaml))

```python
nlp_engine_yaml = """
Expand Down
15 changes: 8 additions & 7 deletions presidio-analyzer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
FROM python:3.12-slim@sha256:804ddf3251a60bbf9c92e73b7566c40428d54d0e79d3428194edf40da6521286

ARG NLP_CONF_FILE=presidio_analyzer/conf/default.yaml
ARG ANALYZER_CONF_FILE=presidio_analyzer/conf/default_analyzer.yaml
ARG RECOGNIZER_REGISTRY_CONF_FILE=presidio_analyzer/conf/default_recognizers.yaml
# Analyzer configuration file (recommended).
# The separate NLP_CONF_FILE and RECOGNIZER_REGISTRY_CONF_FILE are
# deprecated; use the nlp_configuration and recognizer_registry
# sections in ANALYZER_CONF_FILE instead.
ARG ANALYZER_CONF_FILE=presidio_analyzer/conf/analyzer.yaml
ARG NLP_CONF_FILE=
ARG RECOGNIZER_REGISTRY_CONF_FILE=
Comment on lines +7 to +9
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLP_CONF_FILE and RECOGNIZER_REGISTRY_CONF_FILE are now defaulted to an empty string, but the Dockerfile still unconditionally uses these build args in COPY instructions later. With empty defaults, the Docker build will fail when it reaches those COPY ${NLP_CONF_FILE} ... / COPY ${RECOGNIZER_REGISTRY_CONF_FILE} ... steps. Either remove those COPY steps entirely, or keep non-empty defaults (and rely on the unified analyzer config taking priority at runtime).

Copilot uses AI. Check for mistakes.
ENV PIP_NO_CACHE_DIR=1
ENV POETRY_VIRTUALENVS_CREATE=false

Expand All @@ -14,8 +18,6 @@ ENV PORT=3000
ENV WORKERS=1

COPY ${ANALYZER_CONF_FILE} /app/${ANALYZER_CONF_FILE}
COPY ${RECOGNIZER_REGISTRY_CONF_FILE} /app/${RECOGNIZER_REGISTRY_CONF_FILE}
COPY ${NLP_CONF_FILE} /app/${NLP_CONF_FILE}

WORKDIR /app

Expand All @@ -30,11 +32,10 @@ RUN pip install poetry==2.3.2 \
&& poetry install --no-root --only=main -E server \
&& rm -rf $(poetry config cache-dir)

# install nlp models specified in NLP_CONF_FILE or via nlp_configuration in ANALYZER_CONF_FILE
# install nlp models from the unified analyzer configuration
COPY ./install_nlp_models.py /app/

RUN poetry run python install_nlp_models.py \
--conf_file ${NLP_CONF_FILE} \
--analyzer_conf_file ${ANALYZER_CONF_FILE}

COPY . /app/
Expand Down
Loading