Releases: internetarchive/brozzler
Releases · internetarchive/brozzler
v1.9.3
v1.9.2
v1.9.1
v1.9.0
This release contains several new features along with enhancements for performance and reliability. It also contains several dependency updates, including yt-dlp 2025.12.08.
New behavior controls
There are several new features making behaviors more flexible, allowing navigating and gathering outlinks from more complex sites than in previous versions.
- A new
repeatUntilSelectorcontrol makes it possible to instruct behaviors when to stop looping, providing a more robust means to control when looping should stop early. (#431 - @mistydemeo) - It's now possible for custom behavior JavaScript to call the pre-defined outlink gathering function. (#429 - @mistydemeo)
- It's now possible to gather outlinks while behaviors are running, which ensures that it's possible to gather outlinks from sites whose content changes at runtime like sites using JavaScript pagination. (#433/#434 - @mistydemeo)
Performance enhancements
Two new minor performance enhancements limit the default size of page screenshots (#420 - @vbanos) and reduces crawl startup time by caching the Chrome version (#424 - @vbanos).
Reliability enhancements
- Brozzler now detects Chrome error pages and will retry the affected page. (#438 - @adam-miller)
try-loginwill now perform form validation before submitting and trigger input events. (#432/#440 - @netoarmando)- Improved error handling in retry loops. (#441 - @adam-miller)
v1.8.1
v1.8.0
This release contains a number of new capture features. It also removes support for Python 3.8.
- The new
--disable-video-capturecommandline flag allows excluding video captures in thebrozzler-new-sitecommandline tool. (#379) - When video capture has been disabled, brozzler will no longer browse YouTube UMP packets. (#378)
- Audio content types are now also skipped when video capture is disabled. (#380)
- It's now possible to control whether Chrome is launched in headless mode using the
--no-headlesscommandline flag and theheadlesskeyword parameter in theChrome.startmethod. The default hasn't changed. (#373) - Improved performance when visiting page anchors. (#394)
- Improved compatibility when fetching
robots.txtand page headers by enabling legacy renegotiation and disabling errors from unexpected SSL EOFs. (#411) - Fixed a bug which could break captures under certain specific circumstances when a page interstitial is encountered. (#375)
- The header request timeout has been increased from 30 to 60 seconds. (#367)
- Updated the versions of several dependencies.
Thanks to @TheTechRobo for the features in pull requests #373 and #394!