Skip to content
Change the repository type filter

All

    Repositories list

    • Arquivo.pt's branding customizations for our instance of pywb.
      CSS
      GNU General Public License v3.0
      3204Updated Mar 25, 2026Mar 25, 2026
    • Arquivo.pt home page web application
      CSS
      GNU General Public License v3.0
      0003Updated Mar 24, 2026Mar 24, 2026
    • Repo that collects all scripts to be used to setup SolrCloud for images and text.
      Python
      GNU General Public License v3.0
      0000Updated Mar 17, 2026Mar 17, 2026
    • Image Search Indexing over web archived images using Apache Solr indexes.
      Java
      GNU General Public License v3.0
      3205Updated Mar 17, 2026Mar 17, 2026
    • memorial

      Public
      Redirects to arquivo.pt a couple of vhosts. We call this the memorial service.
      Python
      GNU General Public License v3.0
      1001Updated Mar 9, 2026Mar 9, 2026
    • A Slurm cluster using docker-compose to run datatrove pipelines
      Python
      MIT License
      256000Updated Feb 20, 2026Feb 20, 2026
    • The PWA9609 test collection was created to support research on web archive information retrieval (WAIR).
      Python
      GNU General Public License v3.0
      0000Updated Feb 18, 2026Feb 18, 2026
    • A sitemap utility script that exports a sitemap from an URL
      Python
      GNU General Public License v3.0
      1000Updated Feb 16, 2026Feb 16, 2026
    • Shell
      GNU General Public License v3.0
      1110Updated Feb 4, 2026Feb 4, 2026
    • Python tools for processing web archive CDXJ indexes at scale with parallel indexing, efficient filtering, and ZipNum conversion for pywb wayback.
      Python
      GNU General Public License v3.0
      0000Updated Jan 15, 2026Jan 15, 2026
    • The repository consists of a set of scipts used to extract data from APIs. For example, RCAAP API or CienciaVitae API
      Python
      Apache License 2.0
      0000Updated Dec 16, 2025Dec 16, 2025
    • Viagens no tempo repository have some demos using a timeline presentation about some institutions.
      HTML
      GNU General Public License v3.0
      1000Updated Dec 11, 2025Dec 11, 2025
    • Functional tests developed with selenium framework for Arquivo.pt
      Java
      Apache License 2.0
      4003Updated Dec 2, 2025Dec 2, 2025
    • A wrap of the pywb cdxj-indexer command line tool that offers incremental and parallel indexing of a collection.
      Python
      Apache License 2.0
      0001Updated Nov 18, 2025Nov 18, 2025
    • Web Service that generates Web Page Screenshots
      JavaScript
      GNU General Public License v3.0
      0007Updated Nov 17, 2025Nov 17, 2025
    • Arquivo.pt main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retri…
      GNU General Public License v3.0
      7521350Updated Nov 6, 2025Nov 6, 2025
    • Fast extraction of all external links from wikipedia
      Rust
      MIT License
      3000Updated Oct 14, 2025Oct 14, 2025
    • HTML
      GNU General Public License v3.0
      0000Updated Oct 7, 2025Oct 7, 2025
    • It will be a repository containing the code to provide real time analytics of Arquivo.pt Data
      Python
      Apache License 2.0
      0000Updated Sep 26, 2025Sep 26, 2025
    • This repository will have tools/scripts/datasets to analyze the logs from Arquivo.pt
      Python
      Apache License 2.0
      0000Updated Sep 25, 2025Sep 25, 2025
    • An application to explore CDXJ files
      Python
      GNU General Public License v3.0
      0000Updated Jul 15, 2025Jul 15, 2025
    • Arquivo.pt Page Search System
      Java
      GNU General Public License v3.0
      1104Updated Jun 18, 2025Jun 18, 2025
    • Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
      TypeScript
      GNU Affero General Public License v3.0
      64000Updated Mar 4, 2025Mar 4, 2025
    • Serverless replay of web archives directly in the browser
      TypeScript
      GNU Affero General Public License v3.0
      89000Updated Mar 4, 2025Mar 4, 2025
    • Run a high-fidelity browser-based web archiving crawler in a single Docker container
      TypeScript
      GNU Affero General Public License v3.0
      134000Updated Mar 4, 2025Mar 4, 2025
    • wombat

      Public
      Wombat.js client-side rewriting library
      JavaScript
      GNU Affero General Public License v3.0
      38000Updated Feb 11, 2025Feb 11, 2025
    • SOLR imagesearch API repository
      Java
      GNU General Public License v3.0
      2232Updated Jan 22, 2025Jan 22, 2025
    • A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
      TypeScript
      GNU Affero General Public License v3.0
      102000Updated Jan 10, 2025Jan 10, 2025
    • CDXJ Indexing of WARC/ARCs
      Python
      Apache License 2.0
      15000Updated Dec 10, 2024Dec 10, 2024
    • warcio

      Public
      Streaming WARC/ARC library for fast web archive IO
      Python
      Apache License 2.0
      68000Updated Dec 10, 2024Dec 10, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.