Releases: eisenzopf/redactomatic
Releases · eisenzopf/redactomatic
Release v1.25
Added --encoding option to allow users to specify the input encoding.
Release v1.24
- Added Windows setup and test
Release v1.23
- Add --startdate --enddate -chunkoutstem --instem --outstem options.
Redactomatic v1.22
- Remove FAC, GPE, LANGUAGE, NORP, PRODUCT, EVENT, LAUGHTER, LAW, ORG, QUANTITY, WORK_OF_ART, ORDINAL from Level 3.
- Create level 4 which does what level 3 used to do.
- Create level 0 which simply redacts the Token Map and nothing else.
- Log Token Mappings in the output log
- Defend existing labels correctly in TokenMap
- Log changes made by the TokenMap
- Make a modest attempt to defend dates and sums of money in the cardinal rule
- TokenMaps are now case sensitive. Fix test case.
- Bring test-expected in line with changes
- Defend ordinals in the cardinal text rule
Redactomatic v1.21
- Update requirements.txt to latest versions that support both ubuntu 20.04 and Windows 11 including spacy 3.7
- Bring test script up to date
- Add RedactorTokenMap to refactor token-map processing
- Added anonymiztion and redaction order debug to redactomatic
- Correct redactomatic bug that did not correctly track split names for anonymization
- Add protect_zones to Spacy redactor
- Refactor regex_utils and add search()
- Added indexed redaction labels to split spacy names.
- Refactor insertion of indexed labels to share common code
- Add verbosity flag to test-redactomatic.sh
Redactomatic v1.20
- Added --verbose and --no-verbose command line options
- Changed entity restoration error from an exception that stops execution to a warning that restoration failed.
Release 1.19
In this release:
- Added default option to compile a single regex for a whole phrase list to make it more efficient to RedactorPhraseDict and RedactorPhraseList
- Added combine-sets parameter to support turning this off if required
- Added complete prematch and postmatch support for RedactorPhraseDict and RedactorPhraseList
- Added add-wordbreak parameter to RedactorPhraseDict and RedactorPhraseList
- Documented all of the above changes in README
Redactomatic v1.18
This release:
- Add and abort message when trying to restore ignored text with --no-redact set.
- Bugfix for wrong left/right ordering in config file overloading
- Add some helper functions for entity_values
- Add RedactorPhraseDict class to support JSON and YML phrase lists.
- Add RedactorPhraseDict documentation
- Upgrade the protection for stopping regular expressions overwriting other redaction labels.
- Fixed a bug where multi-line regex definition could result in corrupted text.
- Add --traceback option for debugging
- And warning for missing entity definitions
- Clean up the default config.yml
- Separate cardinal text and voice rules
- Remove 'oh' from cardinal rules
- Add sample custom redactanon YML file
- Move aboslute_path to processor base.
- Add explicit support for $REDACT_HOME and local paths in the current working directory
- Add --version option.
Redactomatic v1.17
Brought in line with the Talkmap internal version of corpustools as of 20/10/2022.
Primary changes are:
- added --default rules to allow separation of custom rules and default rule set
- Made redactomatic a processor like any other.
- Moved the clean() routines from redactomatic to processorbase so they can be shared.
- Moved reading of config files from redactomatic to entity_rules so they can be used by other programs.
- Tidied up the imports in redactomatic to stop it importing things it did not need.
- Added substitution and recursive substitution rules to regex_utils
- Added fixes to cardinal digit anonymization to stop digits being concatenated without spaces
- Updated ignore.yml to use regular expressions rather than phrase lists and added protection for common cardinal phrases and contexts.
- Created a test-script area and moved redactomatic tests into there.
- Moved documentation for redactomatic into docs and put in a more general top level README.
- Added a more comprehensive fix for the bug where cardinal rules redacted other redaction labels.
Redactomatic v1.16
Updated the Spacy models to 3.3.0