Releases: bellingcat/auto-archiver
v1.0.0
🎉 Auto Archiver v1.0.0
We're excited to announce a major stable release of the Auto Archiver tool!
This release comes with some major improvements in stability and flexibility, with a modular and extendable architecture supporting a wide range of archiving extractors, enrichers, feeders, databases and storage methods.
✨ New Features
-
We now have full documentation:
📖 https://auto-archiver.readthedocs.io/en/latest/- (Psst: If you were set up for
v0.12
or below, check out our upgrade guide to adapt to the new config format.)
- (Psst: If you were set up for
-
Getting set up for new users should be much easier using the new config editor:
🛠 https://auto-archiver.readthedocs.io/en/latest/installation/config_editor.html -
New
print_pdf
option in the screenshot enricher #159 -
Added an unauthenticated Bluesky archiver #160
-
Major enhancements to the Generic Extractor (previously
youtubedl_archiver.py
):- Extending it to extract targeted content from additional sites using a Dropin structure
- Auto-updating the yt-dlp module to ensure latest fixes and compatibility"
- Universal support for valid youtube-dl URLs
- New TruthSocial extractor
-
New settings page UI #217
-
Added support for
yt-dlp
PO Token clients #222 -
New unofficial API-based TikTok extractor #237
-
Added InstagrAPI server script authenticated access for the
instagram_api_extractor.py
#281
🧹 Stability and Modularity
-
The new modular structure means only modules selected in your config are loaded, keeping the system lightweight and more resilient — if a module you don’t need breaks, your pipeline continues to work.
-
Multiple authentication strategies have been added to reduce the likelihood of platform blocking.
-
The setup process now validates your config and gives detailed error messages in the output log.
-
We’ve increased test coverage and integrated it with GitHub Actions for CI
-
Dependency and packaging management has been migrated to Poetry.
Coming soon!
- Bellingcat will release an article and video guide on the Auto Archiver in the coming weeks.
💬 Keep in Touch!
Whether it's a question, a bug report, or a feature request, please get in touch!
-
Join our Discord thread for tool support and community discussion:
🔗 https://discord.com/channels/709752884257882135/1346596825611632770 -
Found a bug or want to contribute? Open a GitHub issue or PR using our contribution guide:
🤝 https://auto-archiver.readthedocs.io/en/latest/contributing.html
🙌 Thanks
A huge thank you to everyone who has contributed, as well as those who provided feedback and ideas throughout development.
@msramalho @pjrobertson
v0.13.9
What's Changed
- Timestamping enricher rewrite - now works with latest ubuntu + fixes various other issues by @pjrobertson in #224
- Add explicit dependabots for pip/poetry, GH actions and npm by @pjrobertson in #269
- Minor improvements by @pjrobertson in #268
- Force-pins cryptography to >44.0.1 to fix dependabot warning by @pjrobertson in #278
Full Changelog: v0.13.8...v0.13.9
v0.13.8
What's Changed
- Small fix for generic_extractor.py for general/ youtube extraction. by @erinhmclark in #259
- When loading modules, check they have been added to the right 'step' in the config by @pjrobertson in #263
- Add flexible extractor_args to generic_extractor.py by @erinhmclark in #262
- Wacz minor adjustments by @pjrobertson in #261
- Script to auto-generate a service account by @pjrobertson in #255
- Minor fixes by @pjrobertson in #264
Full Changelog: v0.13.7...v0.13.8
v0.13.7
What's Changed
This release brings better Tiktok support and Facebook support for posts and photos 🎉
Details:
- Update material version, minify code by @pjrobertson in #253
- Create facebook dropin - working for images + text. by @pjrobertson in #223
- Add info on building RTD versions + automated building of tagged versions by @pjrobertson in #254
- Refactor the dropin 'is_suitable' method + fix for tikwm by @pjrobertson in #256
- Standardise parse dates to get_datetime_from_str by @pjrobertson in #257
- Version bump by @pjrobertson in #258
Full Changelog: v0.13.6...v0.13.7
v0.13.6
What's Changed
- Adds new extractor for tiktok via unofficial API by @msramalho in #237
- Unit tests for storage types + fix storage too long issues for local storage by @pjrobertson in #243
- Move tikwm extractor into a dropin for the generic extractor by @pjrobertson in #249
- Linting and Formatting with Ruff by @erinhmclark in #244
- Better checking of cookies to add to webdriver + generic extractor tweaks by @pjrobertson in #246
- Opentimestamps Module by @pjrobertson in #247
- Fix pre-commit for ruff check by @pjrobertson in #251
Full Changelog: v0.13.5...v0.13.6
v0.13.5
What's Changed
- Docs tidyups, howto on logging and authentication, remove exit(), small fixes by @pjrobertson in #211
- Update GSheets columns, test file cleanup, other small fixes. by @erinhmclark in #225
- Instagram extractor bugfix by @erinhmclark in #235
- Docker webdriver aarch64 by @pjrobertson in #233
- Auto Updates by @pjrobertson in #234
- Add cache-from and cache-to to docker-publish.yaml. by @erinhmclark in #238
- Cleanup fixes by @erinhmclark in #236
- Fix docker registry reference by @erinhmclark in #239
- Settings page user interface by @pjrobertson in #217
- Merge modules - GSheet, Atlos by @erinhmclark in #226
- Update the release process docs and the latest version in pyproject.toml by @erinhmclark in #240
Full Changelog: v0.13.4...v0.13.5
v0.13.4
What's Changed
- Tests/add module tests by @erinhmclark in #194
- Fix issue #200 + Refactor _LAZY_LOADED_MODULES by @pjrobertson in #210
Full Changelog: v0.13.3...v0.13.4
v0.13.3
What's Changed
- Fix generator by @pjrobertson in #197
- Various fixes for issues with new architecture by @pjrobertson in #208
Full Changelog: v0.13.2...v0.13.3
v0.13.2
What's Changed
- makes orchestrator.run return the results to allow for code integration by @msramalho in #196
Full Changelog: v0.13.1...v0.13.2
v0.13.1
What's Changed
This version has many breaking changes including the organisation of the orchestration file, it should be easy to map old versions to the new ones, when you run the auto-archiver without an existing orchestration.yaml
an example one will be generated and you can port old settings to this one. Check the documentation for more information: https://auto-archiver.readthedocs.io/en/latest/
- Allow setting cookies for yt-dl by @pjrobertson in #158
- adds better debug for wayback failures by @msramalho in #161
- Add 'print_pdf' option to the screenshot enricher. Fixes #132 by @pjrobertson in #159
- adds an unauthenticated Bluesky archiver by @msramalho in #160
- Remove snscrape from the twitter_archiver by @pjrobertson in #165
- Fix two small issues by @pjrobertson in #162
- Migrate to Poetry by @erinhmclark in #164
- CI Unit tests by @pjrobertson in #163
- Add docker-compose for easy building and running of docker image in dev by @pjrobertson in #172
- Tidy up and remove dependencies by @pjrobertson in #169
- Update versions for GH Actions and Geckodriver. by @erinhmclark in #174
- Add Sphinx documentation and publish to RTD. by @erinhmclark in #177
- Create generic archiver for all valid youtube-dl URLs, add truthsocial extractor, unit tests for twitter_api extractor, utility methods for cleaning HTML and traversing objects by @pjrobertson in #175
- modifies base docker image to use browsertrix 1.4.2 by @msramalho in #182
- Add ubuntu-latest to the matrix of test runners by @pjrobertson in #181
- More mainifests by @pjrobertson in #183
- [WIP] Add module tests by @erinhmclark in #189
- Refactor auto-archiver to use a modular structure for feeders/extractors/enrichers etc. by @pjrobertson in #185
- Docs improvement by @pjrobertson in #190
- Fix links to docs by @pjrobertson in #193
- removes fixed oscrypto dependency, it blocked pypi publishing by @msramalho in #195
New Contributors
- @pjrobertson made their first contribution in #158
- @erinhmclark made their first contribution in #164
Full Changelog: v0.12.0...v0.13.1