Skip to content

Releases: eisenzopf/redactomatic

Release v1.25

06 Jan 15:25

Choose a tag to compare

Added --encoding option to allow users to specify the input encoding.

Release v1.24

30 Sep 15:11

Choose a tag to compare

  • Added Windows setup and test

Release v1.23

30 Sep 13:18

Choose a tag to compare

  • Add --startdate --enddate -chunkoutstem --instem --outstem options.

Redactomatic v1.22

21 Mar 16:09

Choose a tag to compare

  • Remove FAC, GPE, LANGUAGE, NORP, PRODUCT, EVENT, LAUGHTER, LAW, ORG, QUANTITY, WORK_OF_ART, ORDINAL from Level 3.
  • Create level 4 which does what level 3 used to do.
  • Create level 0 which simply redacts the Token Map and nothing else.
  • Log Token Mappings in the output log
  • Defend existing labels correctly in TokenMap
  • Log changes made by the TokenMap
  • Make a modest attempt to defend dates and sums of money in the cardinal rule
  • TokenMaps are now case sensitive. Fix test case.
  • Bring test-expected in line with changes
  • Defend ordinals in the cardinal text rule

Redactomatic v1.21

15 Feb 15:37

Choose a tag to compare

  • Update requirements.txt to latest versions that support both ubuntu 20.04 and Windows 11 including spacy 3.7
  • Bring test script up to date
  • Add RedactorTokenMap to refactor token-map processing
  • Added anonymiztion and redaction order debug to redactomatic
  • Correct redactomatic bug that did not correctly track split names for anonymization
  • Add protect_zones to Spacy redactor
  • Refactor regex_utils and add search()
  • Added indexed redaction labels to split spacy names.
  • Refactor insertion of indexed labels to share common code
  • Add verbosity flag to test-redactomatic.sh

Redactomatic v1.20

20 Dec 11:46

Choose a tag to compare

  • Added --verbose and --no-verbose command line options
  • Changed entity restoration error from an exception that stops execution to a warning that restoration failed.

Release 1.19

24 Nov 14:12

Choose a tag to compare

In this release:

  • Added default option to compile a single regex for a whole phrase list to make it more efficient to RedactorPhraseDict and RedactorPhraseList
  • Added combine-sets parameter to support turning this off if required
  • Added complete prematch and postmatch support for RedactorPhraseDict and RedactorPhraseList
  • Added add-wordbreak parameter to RedactorPhraseDict and RedactorPhraseList
  • Documented all of the above changes in README

Redactomatic v1.18

17 Nov 22:13

Choose a tag to compare

This release:

  • Add and abort message when trying to restore ignored text with --no-redact set.
  • Bugfix for wrong left/right ordering in config file overloading
  • Add some helper functions for entity_values
  • Add RedactorPhraseDict class to support JSON and YML phrase lists.
  • Add RedactorPhraseDict documentation
  • Upgrade the protection for stopping regular expressions overwriting other redaction labels.
  • Fixed a bug where multi-line regex definition could result in corrupted text.
  • Add --traceback option for debugging
  • And warning for missing entity definitions
  • Clean up the default config.yml
  • Separate cardinal text and voice rules
  • Remove 'oh' from cardinal rules
  • Add sample custom redactanon YML file
  • Move aboslute_path to processor base.
  • Add explicit support for $REDACT_HOME and local paths in the current working directory
  • Add --version option.

Redactomatic v1.17

20 Oct 11:10

Choose a tag to compare

Brought in line with the Talkmap internal version of corpustools as of 20/10/2022.
Primary changes are:

  • added --default rules to allow separation of custom rules and default rule set
  • Made redactomatic a processor like any other.
  • Moved the clean() routines from redactomatic to processorbase so they can be shared.
  • Moved reading of config files from redactomatic to entity_rules so they can be used by other programs.
  • Tidied up the imports in redactomatic to stop it importing things it did not need.
  • Added substitution and recursive substitution rules to regex_utils
  • Added fixes to cardinal digit anonymization to stop digits being concatenated without spaces
  • Updated ignore.yml to use regular expressions rather than phrase lists and added protection for common cardinal phrases and contexts.
  • Created a test-script area and moved redactomatic tests into there.
  • Moved documentation for redactomatic into docs and put in a more general top level README.
  • Added a more comprehensive fix for the bug where cardinal rules redacted other redaction labels.

Redactomatic v1.16

01 Jun 14:04

Choose a tag to compare

Updated the Spacy models to 3.3.0