Releases: microsoft/presidio
Release 2.2.361
What's Changed
- Fix CI for PR flow by @tamirkamara in #1718
- Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #1714
- Bump pypa/gh-action-pypi-publish from 1.12.4 to 1.13.0 by @dependabot[bot] in #1716
- Fix error in labeling workflow by @tamirkamara in #1719
- Bump actions/setup-dotnet from 4 to 5 by @dependabot[bot] in #1715
- Test for yaml config for custom recognizers by @omri374 in #1720
- Added Thai National ID Number (TNIN) recognizer by @pangchewe in #1713
- Bump actions/github-script from 6 to 8 by @dependabot[bot] in #1725
- Fix(DICOM): handle both names and patient by @atinm in #1723
- Fix minor typo by @andyjessen in #1727
- Optimize Docker Images by @tamirkamara in #1722
- Update 02_regex.md, fix warning of deprecated escape sequence by @jks-liu in #1730
- Update noqa syntax by @tamirkamara in #1734
- Standardize docker images for Windows by @tamirkamara in #1729
- Update noqa (part 2) by @tamirkamara in #1736
- fix(analyzer): update Korean language code from 'kr' to 'ko' by @heesng-jung in #1742
- Bump github/codeql-action from 3 to 4 by @dependabot[bot] in #1745
- feat: add GSTin Recognizer for India Specfic Region by @AyushAggarwal1 in #1744
- Add simplified recipes gallery scaffolding for domain-specific Presidio customization examples by @Copilot in #1743
- Update Python version support: Remove 3.9 (EOL), Add 3.13 by @Copilot in #1741
- Update CI to temporarily drop Python 3.13 version by @SharonHart in #1752
- Dependency Review Action by @tamirkamara in #1748
- Publish docs via Github Actions by @tamirkamara in #1749
- Add Docker HEALTHCHECK to all Dockerfiles by @dorlugasigal in #1755
- Update build status badge in README.md by @SharonHart in #1757
- fix: Replace unsafe DefaultAzureCredential usage in AHDS components by @dorlugasigal in #1751
- fix release-docs workflow credentials by @tamirkamara in #1758
- fix: Add non-root user to all Dockerfiles for security by @dorlugasigal in #1759
- Docker Image Attestation by @tamirkamara in #1762
- Update checkov skip-path by @tamirkamara in #1764
- AHDS authentication security with ChainedTokenCredential and add CodeQL suppression by @dorlugasigal in #1763
- Fix Transformers Recognizer return value on REST API by @SharonHart in #1767
- Add support for py 3.13 by @omri374 in #1774
- Add redact_and_return_bbox method to ImageRedactorEngine by @siwoo-jung in #1777
- fix unit tests by @RonShakutai in #1778
- Move poetry cache directory by @tamirkamara in #1784
- Remove poetry cache from docker images by @tamirkamara in #1785
- Rename dockerignore files by @tamirkamara in #1787
- Remove build-essential from the Analyzer docker image by @tamirkamara in #1789
- Bump actions/checkout from 5 to 6 by @dependabot[bot] in #1793
- Fix dev container permission issues by @dorlugasigal in #1788
- CI coverage test by @RonShakutai in #1794
- Fix coverage job to use component path for SUBPROJECT_ID by @RonShakutai in #1798
- Language models integration (LangExtract) by @RonShakutai in #1775
- Coverage data has been included in the documentation. by @RonShakutai in #1799
- Fix Redoc API Docs script Inclusion by @SharonHart in #1796
- Bug fix: Remove **kwargs from recognizer init methods by @RonShakutai in #1800
- Add Azure OpenAI support for LangExtract recognizer by @dorlugasigal in #1801
- Add a validation layer for YAML based configuration by @omri374 in #1780
- fix: Improve Korean RRN regex pattern validation by @kyoungbinkim in #1807
- Change parameters in extraction in langextract by @RonShakutai in #1811
- fix(analyzer): Enable GPU inference for GLiNERRecognizer by @eveningcafe in #1813
- Bump actions/cache from 4 to 5 by @dependabot[bot] in #1817
- Simplify IBAN regex pattern and fix trailing character handling by @Copilot in #1818
- Add Japanese and Chinese mobile number test cases for PhoneRecognizer by @WenwenHLF in #1808
- GPU optimizations by @RonShakutai in #1812
- feat: Add Korean passport number recognizer (KR_PASSPORT) by @kyoungbinkim in #1814
- [Feature] add kr driver license number recognizer by @RektPunk in #1820
- Samples: add telemetry redaction sample by @Jakob-98 in #1824
- Feat: add class_name to allow multiple recognizers from same class by @RonShakutai in #1819
- Docs/gpu acceleration guide by @dilshad-aee in #1826
- [Feature] add korean business registration number recognizer by @RektPunk in #1822
- Refactor: lazy initialization for device_detector singleton by @RonShakutai in #1831
- [Feature] add Korean Foreigner Registration Number recognizer by @RektPunk in #1825
- feat: Add MacAddressRecognizer by @kyoungbinkim in #1829
- Fix gliner truncates text by @jedheaj314 in #1805
- Migrate short-running workflows to ubuntu-slim runners by @Copilot in #1840
- feat(recognizers): add UsMbiRecognizer for US Medicare Beneficiary ID by @chrisvoncsefalvay in #1821
- Fix language in pattern recognizer example by @andyjessen in #1835
- Update cryptography dependency to >=46.0.4 for CVE-2025-15467 by @Copilot in #1841
- Add a configurable LangExtract recognizer for use with any provider. by @telackey in #1815
- Support batch processing over the REST API. by @telackey in #1806
- Fix Analyzer build on 3.10 by @SharonHart in #1848
- Add salted hashing to hash operator to prevent brute-force attacks by @Copilot in #1846
- Prepare release 2.2.361: bump versions and finalize changelog by @Copilot in #1851
New Contributors
- @pangchewe made their first contribution in #1713
- @atinm made their first contribution in #1723
- @andyjessen made their first contribution in #1727
- @jks-liu made their first contribution in #1730
- @heesng-jung made their first contribution in #1742
- @AyushAggarwal1 made their first contribution in #1744
- @dorlugasigal made their first contribution in #1755
- @kyoungbinkim made their first contribution in #1807
- @eveningcafe made their first contribution in #1813
- @WenwenHLF made their first contribution in #1808
- @RektPunk made their first contribution in #1820
- @dilshad-aee made their first contribution in #1826
- @jedheaj314 made their first contribution in #1805
- @chrisvoncsefalvay made their first contribution in #1821
- @telackey made ...
Release 2.2.360
Analyzer
Added
- Korean Resident Registration Number (RRN) recognizer with checksum validation for numbers issued prior to October 2020 (#1675) (Thanks @siwoo-jung)
- Azure Health Data Services (AHDS) de-identification service integration as a remote recognizer with Entra ID authentication (#1624) (Thanks @rishasurana)
- Comprehensive input validation methods for NlpEngineProvider to ensure valid arguments for engines, configuration, and file paths (#1653) (Thanks @siwoo-jung)
Changed
- Updated Indian Aadhaar recognizer to support contextual delimiters (-, :, space) for improved detection accuracy (#1677) (Thanks @K3y5tr0ke)
- Fixed Italian Driver License recognizer regex to include missing characters per government requirements, excluding only A, O, Q, I (#1651) (Thanks @K3y5tr0ke)
- Refactored recognizers folder structure for better organization and maintainability (#1670) (Thanks @omri374)
Anonymizer
Added
- Azure Health Data Services (AHDS) Surrogate anonymization operator with medical domain expertise for realistic PHI surrogate generation (#1672) (Thanks @rishasurana)
Changed
General
Added
- Comprehensive GitHub Copilot instructions with development guidelines, build processes, and e2e testing procedures (#1693) (Thanks @Copilot)
- New GitHub Actions CI & release workflows with multi-platform Docker image support for AMD64 and ARM64 architectures (#1697) (Thanks @tamirkamara)
- Dual-path CI workflow to fix GitHub Actions failures for external contributors by auto-detecting fork vs. main repository PRs (#1708) (Thanks @Copilot)
- OIDC trusted publishing for PyPI releases eliminating manual API token management and enhancing security (#1702) (Thanks @Copilot)
- Comprehensive YAML and Python examples for context-aware recognizers documentation (#1710) (Thanks @MRADULTRIPATHI)
Changed
- Updated actions/checkout from v4 to v5 to support Node.js 24 runtime (#1699) (Thanks @dependabot)
- Fixed PR template to use proper GitHub issue linking syntax for automatic issue association and closing (#1701) (Thanks @Copilot)
- Updated LiteLLM documentation with detailed guide links for better integration guidance (#1698) (Thanks @BhargavDT)
- Fixed broken links in CONTRIBUTING.md and developing recognizers documentation after recognizers refactoring (#1674) (Thanks @siwoo-jung)
- Fixed OpenSSF badge embedding in README.MD for proper display (#1673) (Thanks @SharonHart)
- Removed Terrascan from Microsoft Defender for DevOps workflow to eliminate false positives on non-IAC repository (#1691) (Thanks @Copilot)
Security
- Updated Streamlit and PyTorch dependency versions to fix CVE vulnerabilities (#1685) (Thanks @SharonHart)
- Updated requests library to mitigate security vulnerability GHSA-9hjg-9r4m-mvj7 (#1683) (Thanks @SharonHart)
- Locked pandas dependency in Streamlit to prevent version conflicts (#1689) (Thanks @SharonHart)
2.2.359
Release 2.2.359
This is a period release with feature enhancements, bug fixes, documentation updates and one configuration change.
Changes in Presidio's behavior
Turn country specific recognizers to disabled to avoid false positives when they are not needed.
Most country specific recognizers that expect English were put as optional to avoid false positives, and would not work out-of-the-box (#1586). Specifically:
- SgFinRecognizer
- AuAbnRecognizer
- AuAcnRecognizer
- AuTfnRecognizer
- AuMedicareRecognizer
- InPanRecognizer
- InAadhaarRecognizer
- InVehicleRegistrationRecognizer
- InPassportRecognizer
- EsNifRecognizer
- InVoterRecognizer
To re-enable them, either change the default YAML to have them as enabled: true, or via code, add them to the recognizer registry manually.
- Yaml based: see more here: YAML based configuration.
- Code based:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import AuAbnRecognizer
# Initialize an analyzer engine with the recognizer registry
analyzer = AnalyzerEngine()
# Create an instance of the AuAbnRecognizer
au_abn_recognizer = AuAbnRecognizer()
# Add the recognizer to the registry
analyzer.registry.add_recognizer(au_abn_recognizer)Changes:
Analyzer
- Allow loading of StanzaRecognizer when StanzaNlpEngine is configured, improving NLP engine flexibility (#1643) (Thanks @omri374)
- Excluded recognition_metadata attribute from REST Analyze Response DTO to clean up API responses (#1627) (Thanks @SharonHart)
- Added ISO 8601 support to DateRecognizer for improved date parsing (#1621) (Thanks @StefH)
- Prevented misidentification of 13-digit timestamps as credit cards (#1609) (Thanks @eagle-p)
- Updated analyzer_engine_provider.md for clarity and completeness (#1590) (Thanks @AvinandanBandyopadhyay)
- Bumped python from 3.9 to 3.12 in presidio-analyzer Dockerfile (#1583) (Thanks @dependabot)
- Bumped phonenumbers version for improved validation and parsing (#1579) (Thanks @omri374)
- Refactored InstanceCounterAnonymizer to simplify index retrieval logic (#1577) (Thanks @ShakutaiGit)
- Fixed issue #1574 to support as_tuples in relevant functions (#1575) (Thanks @omri374)
- Updated initial scores in IN_PAN for better recognition performance (#1565) (Thanks @omri374)
- Added accelerate as a missing build dependency to fix build failures (#1564) (Thanks @SharonHart)
- Don't set a default for LABELS_TO_IGNORE if not specified, to avoid unintended behavior (#1563) (Thanks @SharonHart)
- Updated 08_no_code.md for documentation improvements (#1561) (Thanks @alan-insam)
- Added the ability to disable the NLP recognizer via configuration (#1558) (Thanks @omri374)
- Removed 'class' from API documentation for clarity (#1554) (Thanks @omri374)
- Set country-specific default recognizers to enabled=false for safer defaults (#1586) (Thanks @omri374)
- Most country specific recognizers that expect English were put as optional to avoid false positives, and would not work out-of-the-box (#1586).
Anonymizer
- Update python base image to 3.13 (#1612) (Thanks @dependabot[bot])
- Bumped python from 3.12-windowsservercore to 3.13-windowsservercore in presidio-anonymizer Dockerfile (#1612) (Thanks @dependabot)
- Ensured anonymizer sorts analyzer results input by start and end for correct whitespace merging (#1588) (Thanks @mkh1991)
- Bumped python from 3.9 to 3.12 in presidio-anonymizer Dockerfile (#1582) (Thanks @dependabot)
Image Redactor
- Bumped python from 3.12-slim to 3.13-slim in presidio-image-redactor Dockerfile (#1611) (Thanks @dependabot)
- Bumped python from 3.10 to 3.12 in presidio-image-redactor Dockerfile (#1581) (Thanks @dependabot)
General
- Fixed typographical errors in documentation files for better clarity (#1637) (Thanks @kilavvy)
- Corrected spelling mistakes across code comments and documentation for improved readability (#1636) (Thanks @leopardracer)
- Fixed typos in documentation and test descriptions, enhancing clarity and consistency in the codebase (#1631) (Thanks @zeevick10)
- Corrected typos in docstrings and comments to maintain documentation quality (#1630) (Thanks @kilavvy)
- Fixed typos in documentation and test descriptions, ensuring accurate references and descriptions (#1628) (Thanks @leopardracer)
- Removed unnecessary run.bat script from the repository (#1626) (Thanks @SharonHart)
- Added "/TestResults" to .gitignore file to prevent test result artifacts from being committed (#1622) (Thanks @StefH)
- Added links to the discussion board about Docker prebuilt images to documentation (#1614) (Thanks @omri374)
- Fixed spelling, grammar, and style issues in Presidio V2 documentation (#1610) (Thanks @Vruddhi18)
- Updated .gitignore to include the .vs folder (#1608) (Thanks @StefH)
- Fixed typo in api-docs.yml to improve documentation accuracy (#1602) (Thanks @StefH)
- Reverted a previous update to codeql-analysis.yml to restore earlier configuration (#1595) (Thanks @SharonHart)
- Updated codeql-analysis.yml for improved code scanning configuration (#1594) (Thanks @SharonHart)
- Fixed paths-ignore in codeql-analysis.yml to refine scanning scope (#1593) (Thanks @SharonHart)
- Ignored docs/ directory in CodeQL analysis to prevent unnecessary scanning (#1592) (Thanks @SharonHart)
- Fixed minor typos in code and documentation (#1585) (Thanks @omahs)
- Restored dependabot scanning for security and dependency updates (#1580) (Thanks @SharonHart)
- Added SUPPORT.md file to provide support information to users (#1568) (Thanks @omri374)
Version 2.2.358 (#1553)
Changes:
- 04920aa Version 2.2.358 (#1553)
- dcf1ae0 drop blake2b (#1552)
- a0484dd Replace MD5 with Blake2 (#1550)
- e9df3be Exclude closing single or double quote from URL in URL recogniser (#1532)
- cf75c5a Updated the Evaluating DICOM Redaction documentation to reflect changes in verify_dicom_instance() within the DicomImagePiiVerifyEngine class. (#1549)
- 6b10fd5 Updated the return type annotation of in function from to . (#1547)
- bacf23f Add multiprocessing parameters (#1521)
- 0856479 Migrate to poetry 2.0 (and PEP 621) (#1539)
- 8b288fa docs: Add environment setup and notebook execution guides for Presidio + Spark in Fabric (#1529)
- 5881d75 Update defender-for-devops.yml (#1544)
See More
- eb72ae0 Update CodeQL and defender-for-devops workflows (#1540)
- 5152805 Replace pycryptodome with cryptography (#1537)
- 5fea5af Upgrade codeql (#1536)
- 26f2e92 Fix python 3.9 Builds (#1534)
- 65eabd4 add spacy_stanza into stanza_nlp_engine as it is no longer maintained (#1522)
- 35ab8ae Move sanitize_value to be common, Fix InPassportRecognizer (#1519)
- 6f840ea remove azure-core (#1517)
This list of changes was auto generated.
2.2.357
Version 2.2.356 (#1477)
Changes:
- 9fee330 Version 2.2.356 (#1477)
- a081c22 Update presidio containers to use gunicorn (#1497)
- ebf0ca5 Restricting spacy.cli for version 3.7.0 (#1495)
- 3d9cee9 Fix regex match_time output (#1488)
- a21a17c Add a link to model classes to simplify configuration (#1472)
- d238da9 Update community.md (#1469)
- a0a5f89 Use ACR service connection when pushing containers (#1484)
- fde30dd Add support for allow_list, allow_list_match, regex_flags in REST API (#1478)
- ce63783 Unlock numpy after dropping 3.8 (#1480)
- 33808c2 Removed python 3.8 support (EOL) and added 3.12 (#1479)
See More
- cc31bb6 Add a link to HashiCorp vault operator resource (#1468)
- 71fa64d docs: clarify the docs on deploying presidio to k8s (#1453)
- 21361f9 Updates to the transformers conf docs and yaml file (#1467)
- 13ae328 Fix presidio-structured build - lock numpy version (#1465)
- 49f2b6a Fix space (#1459)
- b9f6cba Bug/azure ai language context (#1458)
- 89ccadb Update US_SSN CONTEXT and unit test (#1455)
- c54ce2b Add UK National Insurance Number Recognizer (#1446)
- 9321e14 Remove ignored labels from supported entities (#1454)
- 0721e36 Dev containers for: analyzer, analyzer+transformers, anonymizer and image redaction (#1450)
- 4aeb56b added batching support (#1449)
- 1bf22ed Update installation.md (#1439)
- e55300a Update defender-for-devops.yml (#1437)
- e08f44b Fix #1442 (#1445)
- 9696b9e Reduce memory usage of Analyzer test suite (#1429)
- 6c51464 added logic to handle phonenumbers with country code (#1426)
- 3e4a806 Update CI due to DockerCompose project name issue (#1428)
- 2fe6ad7 closing handles (#1424)
- cd7e547 (docs) Use Presidio across Anthropic, Bedrock, VertexAI, Azure OpenAI, etc. w/ LiteLLM Proxy (#1421)
- 8dc46e2 Make sure that configuration files are closed when loading them (#1423)
- ada5fce Do not release presidio-cli as part of the release pipeline (#1422)
- d85ba6e Typo fix added missing ":" after if condition (#1419)
- d46bacb minor notebook changes (#1420)
This list of changes was auto generated.
version 2.2.355 (#1410)
Changes:
- edd722d version 2.2.355 (#1410)
- c059131 changing predefined recognizers to use the config file (#1393)
- 56f0df2 Update Dockerfile.windows (#1414)
- ad77f2f Update Dockerfile.windows (#1413)
- ac38cca NLP engine sample + refresh on samples (#1388)
- 97a7e42 Fix the entity filtering of the transformer_recognizer.py analzye function (#1403)
- 2be6de1 Fix ports in docs (#1408)
- a3a609b Improve url detector (#1398)
- 4752166 From Pipenv to Poetry (#1391)
- ebbfd30 Added presidio structured downloads to readme (#1392)
See More
- 67d5837 Feature/analyzer documentation (#1384)
- e65c89c Migrate Python Packaging to pyproject.toml (#1383)
- 2d92539 Fix N818, E721 (#1382)
- 51dc5c6 Auto-formatting, fix D rules (#1381)
- cb0184a Add Ruff linter + Apply Ruff fix (#1379)
- 2348fff Fix OverflowError in crypto_recognizer (#1377)
- ff31243 Align ports with documentation and postman collection. (#1375)
- 2805c86 Loading analyzer engine & recognizer registry from configuration file (#1367)
- 55bfb8f added regex functionality for allow lists in the analyzer (#1357)
- e64d8ec Spanish NIE (Foreigners ID card) recognizer (#1359)
- f29e112 Update conf files location (#1358)
- 41e0202 feat: Add new recognizer for IN_VOTER #1344 (#1345)
- 5ea004d New Predefined Recognizer for Indian Passport #1350 (#1351)
- c7fa825 Added Finnish Personal Identity Code Recognizer. (#1349)
This list of changes was auto generated.
2.2.354
Changes:
- ffa29f8 Fixed wrong condition for dicom metadata (#1347)
- 49a996d Changed default aggregation_strategy to max (#1342)
- db8ff82 feat: Implement user-defined entity selection strategies in Presidio Structured (#1319)
- 4db5278 Cache compiled regexes in analyzer (#1335)
- 9c3369d Bugfix - Fix for incorrectly referenced recognizer in analysis_explanation using PhoneRecognizer (#1332)
- 6a4135e Fix bug where "bank" and "check" wouldn't work (#1333)
- ea8d830 Added contributions to readme (#1331)
- 733cca2 Adding Span Marker Recognizer Sample (#1321)
- 1911a3d Update spacy_stanza.md (#1325)
See More
- d71c5fb feat: Add Singapore UEN Recognizer (#1315)
- 59af84d Added tesseract to installation (#1312)
- 4c48b92 Addition of leniency parameter in predefined PhoneRecognizer (#1311)
- 173b527 Bugfix in tutorial (#1310)
- dee6562 predefined pattern recognizer : IN_VEHICLE_REGISTRATION (#1288)
- a8d2c90 feat: Add Bech32 and Bech32m Bitcoin Address Validation in Crypto Recognizer and expand tests (#1307)
- 45c418d feat: Support 'M' prefix in SG_NRIC_FIN Recognizer and expand tests (#1304)
- 5dfbf27 Analysis builder improvements (#1295)
- 7f09c95 added pseudonimyzation sample (#1296)
This list of changes was auto generated.
2.2.353
- Add predefined_recognizer: IN_AADHAAR (#1256)
- Added the option to add custom operators + pseudonymization sample (#1284)
- Fix failing test due to optional package (#1258)
- Allow local Spacy Models to be loaded in NLP Engine (#1269)
- Upgrade pip in windows containers (#1272)
- Bugfix in ImageAnalyzerEngine #1274
2.2.352
Changes:
Added
Structured
- Added alpha of presidio-structured, a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data. (#1192)
Analyzer
- Add PL PESEL recognizer (#1209)
- Azure AI language recognizer (#1228)
- Add_conf_to_package_data (#1243)
Anonymizer
- Add keep operator as deanonymizer (#1255)
- Update anonymize_list type hints and document that sometimes items will be ignored. (#1252)
General
- Add Dockerfile for Windows containers (#1194)
Changed
Analyzer
- Drop WA driver license number (#1214)
- Change ner_model_configuration from list to map (#1222)
- Bugfix in SpacyRecognizer (#1221)
- Bugfix in NerModelConfiguration (#1230)
- Add_conf_to_package_data (#1243)
Anonymizer
- Improved the logic of conflict handling in AnonymizerEngine (#1196)
Image Redactor
- Change default score threshold in image redactor (#1210)
- fixes bug #1227 (#1231)
- Added missing dependencies for opencv-python and azure forms recognizer (#1257)