Skip to content

Releases: uktrade/idscrub

v2.0.15

30 Mar 15:25

Choose a tag to compare

  • Adds NHS recogniser if the entity is specified
  • Adds NINO recogniser if the entity is specified
  • Corrects type hints on (you can't have None as the default value for a list type)
  • Adds extra boilerplate because of the above correction
  • Adds tests for NHS and NINO functionality

v2.0.14

19 Mar 08:11
b0ba895

Choose a tag to compare

  • Small type, test and README fixes

v2.0.13

17 Mar 14:18
3f378a1

Choose a tag to compare

  • Use native Pandas type "string" instead of Python base "str" for DataFrame type conversion

v2.0.12

19 Feb 10:27
c10d304

Choose a tag to compare

  • Adds **kwargs argument to IDScrub.dataframe(), which allows other keyword arguments in IDScrub() to be modified when using IDScrub.dataframe()
  • Fixes ID column bug in IDScrub.dataframe() when no id_col argument is passed

v2.0.11

13 Feb 10:16
245de64

Choose a tag to compare

  • Add custom_methods.ipynb to give further examples on customisation and contribution
  • Add extra information and advice to README

v2.0.1

03 Feb 11:51
abfb579

Choose a tag to compare

  • Handle whitespace-only strings if passed to IDScrub.presidio_entities by stripping whitespace

v2.0.0

03 Feb 10:26
94cb1c1

Choose a tag to compare

  • This release contains non-backward compatible (breaking) changes. This is required for improving the package.
  • scrub.scrub() now takes a pipeline dictionary that defines the scrub methods.
  • Each method now outputs a list of IDEnt (identified entity) objects instead of the scrubbed text. These objects contain all of the information required to find and replace the entity identified.
  • The list is then passed to a resolve_overlaps , which selects the entity with a higher priority score if multiple entities are identified in the same text e.g. john@madeupemail.com is both an email and a handle (@madeupemail), but we can score the email higher so an email is retracted.
  • The final de-duplicated list is then passed to scrub_texts , which removes the text and replaces it.
  • Add an exclude argument to exclude certain strings from being scrubbed.
  • Improve error handling.
  • This is a big improvement because every method sees the same text, then the text is scrubbed in one step once all of the conflicts have been reconciled.

v1.1.1

27 Jan 12:38
3931afd

Choose a tag to compare

  • Update README to emphasise development
  • Pin Pandas <3.0 until changes resolved

v.1.1.0

20 Jan 13:06
6a919e0

Choose a tag to compare

  • Allows users to specify which SpaCy entities to scrub
  • Changes all NER scrubbing methods to <method>_entities e.g. IDScrub.spacy_entities()
  • Adds a uk addresses scrubbing method