Skip to content
Change the repository type filter

All

    Repositories list

    • pywb

      Public archive
      Core Python Web Archiving Toolkit for replay and recording of web archives
      JavaScript
      239000Updated Jan 5, 2026Jan 5, 2026
    • Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in this repo is now only for reference. For support and issues of 'warc-indexer', please communicate with NetArchiveSuite.
      Java
      26132903Updated Nov 21, 2025Nov 21, 2025
    • ukwa-pywb

      Public
      JavaScript
      411240Updated Nov 21, 2025Nov 21, 2025
    • ukwa-heritrix

      Public archive
      The UKWA Heritrix3 custom modules and Docker builder.
      Java
      711451Updated Dec 2, 2024Dec 2, 2024
    • w3act

      Public
      w3act is an annotation and curation tool for building web archive collections
      Java
      621650Updated Jan 30, 2024Jan 30, 2024
    • py-wacz

      Public archive
      Python
      13000Updated Sep 14, 2023Sep 14, 2023
    • hapy

      Public archive
      A Python wrapper around the Heritrix API.
      Python
      4400Updated Aug 29, 2023Aug 29, 2023
    • heritrix3

      Public archive
      Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
      Java
      780000Updated Jul 28, 2022Jul 28, 2022
    • siegfried

      Public archive
      signature-based file format identification
      Go
      31000Updated May 14, 2022May 14, 2022
    • browsertrix-cloud

      Public archive
      TypeScript
      62000Updated May 13, 2022May 13, 2022
    • solrwayback

      Public archive
      A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
      Java
      27000Updated May 4, 2022May 4, 2022
    • mrjob

      Public archive
      Run MapReduce jobs on Hadoop or Amazon Web Services
      Python
      586000Updated Mar 23, 2022Mar 23, 2022
    • Core Java libraries for Memento clients.
      Java
      2120Updated Jan 5, 2022Jan 5, 2022
    • rclone

      Public archive
      "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files
      Go
      4.8k000Updated Nov 12, 2021Nov 12, 2021
    • puppeteer

      Public archive
      Headless Chrome Node.js API
      TypeScript
      9.4k000Updated Oct 13, 2021Oct 13, 2021
    • browsertrix-crawler

      Public archive
      Run a high-fidelity browser-based crawler in a single Docker container
      JavaScript
      125100Updated Sep 24, 2021Sep 24, 2021
    • banana

      Public archive
      Banana for Solr - A Port of Kibana
      JavaScript
      234000Updated Dec 17, 2020Dec 17, 2020
    • outbackcdx

      Public archive
      A Wayback RemoteResourceIndex server using RocksDB
      Java
      21000Updated Oct 9, 2020Oct 9, 2020
    • shine

      Public archive
      Prototype SOLR-powered web archive exploration UI.
      JavaScript
      743500Updated Jun 3, 2020Jun 3, 2020
    • web-archives

      Public archive
      Jupyter Notebook
      8000Updated May 14, 2020May 14, 2020
    • webarchive-commons

      Public archive
      Java
      74201Updated Feb 17, 2020Feb 17, 2020
    • openwayback

      Public archive
      The OpenWayback Development Project
      Java
      302000Updated Feb 17, 2020Feb 17, 2020
    • A collection of Jupyter notebooks for working with web archive data, tools and APIs
      Jupyter Notebook
      2700Updated Nov 18, 2019Nov 18, 2019
    • opendata

      Public
      Repository of documentation about the open datasets published by the UK Web Archive.
      HTML
      61510Updated Jun 21, 2019Jun 21, 2019
    • A 'vanilla' Warclight instance for workshops and demonstration.
      Ruby
      1000Updated Apr 17, 2019Apr 17, 2019
    • python-heritrix

      Public archive
      simple python wrapper around heritrix v3.x api
      Python
      5401Updated Apr 16, 2019Apr 16, 2019
    • awesome-web-archiving

      Public archive
      An Awesome List for getting started with web archiving
      176900Updated Mar 7, 2019Mar 7, 2019
    • warcprox

      Public archive
      WARC writing MITM HTTP/S proxy
      Python
      65000Updated Mar 4, 2019Mar 4, 2019
    • omeka-s-docker

      Public archive
      Dockerfile
      31000Updated Jan 22, 2019Jan 22, 2019
    • Reference deployment of JupyterHub with docker
      Python
      388000Updated Jan 18, 2019Jan 18, 2019