This release includes the following updates:
- Maintenance:
- The classification models were retrained to be compatible with Python 3.7 and Scikit-learn 1.0.2
- Various used libraries were updated in the installation script as far as possible.
- The settings of OpenWPM in the link detection module were changed to prevent the script from hanging during large-scale web scraping.
- New Features:
- The link detection module was improved to score metadata in a cleaner way.
- Various wordings were added to the link detection module to search for more potential privacy statements (based on this work
- A separate link detection function was added to find CCPA/CPRA related privacy statements.
- Markdown text extraction was added to the toolchain in case that the structure of the privacy policy is required.