You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This release contains non-backward compatible (breaking) changes. This is required for improving the package.
scrub.scrub() now takes a pipeline dictionary that defines the scrub methods.
Each method now outputs a list of IDEnt (identified entity) objects instead of the scrubbed text. These objects contain all of the information required to find and replace the entity identified.
The list is then passed to a resolve_overlaps , which selects the entity with a higher priority score if multiple entities are identified in the same text e.g. john@madeupemail.com is both an email and a handle (@madeupemail), but we can score the email higher so an email is retracted.
The final de-duplicated list is then passed to scrub_texts , which removes the text and replaces it.
Add an exclude argument to exclude certain strings from being scrubbed.
Improve error handling.
This is a big improvement because every method sees the same text, then the text is scrubbed in one step once all of the conflicts have been reconciled.