Skip to content

maastrichtlawtech/rechtspraak-extractor

Repository files navigation

Rechtspraak extractor

This library contains two functions to get rechtspraak data and metadata from the API.

Version

Python 3.9+

Contributors

pranavnbapat
Pranav Bapat
running-machin
running-machin
Cloud956
Piotr Lewandowski
shashankmc
shashankmc
gijsvd
gijsvd

How to install?

pip install rechtspraak_extractor

What are the functions?

  • Rechtspraak Extractor
    1. get_rechtspraak
    2. Gets all the ECLIs and saves them in the CSV file or in-memory.
      It gets, ECLI, title, summary, updated date, link.
    3. get_rechtspraak_metadata
    4. Gets the metadata of the ECLIs created by above function and saves them in the new CSV file or in-memory.
      Link attribute that we get from the above function contains the links of ECLI metadata.
      It gets instantie, datum uitspraak, datum publicatie, zaaknummer, rechtsgebieden, bijzondere kenmerken, inhoudsindicatie, and vindplaatsen.
      Supports two extraction methods: method='api' (default, fetches live from Rechtspraak API) and method='sqlite' (fetches from a local pre-built SQLite database — see below).
    5. fetch_eclis_via_sqlite
    6. Low-level function to look up a list of ECLIs directly from a local SQLite database and return a DataFrame. Requires the rechtspraak-lido-sqlite package to be installed and its database populated first (see SQLite method below).
  • What are the parameters?

    1. get_rechtspraak(max_ecli=100, sd='2022-05-01', ed='2022-10-01', save_file='y')
    2. Parameters:
      • max_ecli: int, optional
      • Maximum amount of ECLIs to retrieve
        Default: 100
      • sd: date, optional, default '2022-08-01'
      • The start publication date (yyyy-mm-dd)
      • ed: date, optional, default current date
      • The end publication date (yyyy-mm-dd)
      • save_file: ['y', 'n'], default 'y'
      • y - Save data as a CSV file in data folder
        n - Save data as a dataframe in-memory
    3. get_rechtspraak_metadata(...)
      • save_file: ['y', 'n'], default 'n'
      • y - Save data as a CSV file in data folder
        n - Return data as a dataframe in-memory
      • dataframe: dataframe, optional
      • Dataframe containing ECLIs to retrieve metadata. Cannot be combined with filename
      • filename: string, optional
      • CSV file containing ECLIs to retrieve metadata. Cannot be combined with dataframe
      • method: ['api', 'sqlite'], default 'api'
      • api - Fetch metadata live from the Rechtspraak API
        sqlite - Fetch metadata from a local SQLite database (requires rechtspraak-lido-sqlite)
      • sqlite_db_path: string, default 'data/lido_metadata.db'
      • Path to the SQLite database file. Only used when method='sqlite'
      • fallback_to_api: bool, default True
      • When using method='sqlite', fall back to the live API for any ECLIs not found in the database
      • multi_threading: bool, default True
      • Use multi-threading for API-based metadata extraction. Set to False for single-threaded execution
    4. fetch_eclis_via_sqlite(ecli_list, sqlite_db_path, columns)
      • ecli_list: list[str]
      • List of ECLI identifiers to look up
      • sqlite_db_path: string
      • Path to the SQLite database file produced by rechtspraak-lido-sqlite
      • columns: list[str]
      • Column names to select from the database

    Examples

    Downloading ECLIs

    import rechtspraak_extractor as rex
    
    # Get rechtspraak data as a DataFrame (100 ECLIs since 2022-08-01)
    df = rex.get_rechtspraak(max_ecli=100, sd="2022-08-01", save_file="n")
    
    # Save rechtspraak data directly to CSV in the data/ folder
    rex.get_rechtspraak(max_ecli=100, sd="2022-08-01", save_file="y")

    Extracting metadata via the live API (default)

    # Get metadata into a DataFrame from an existing DataFrame
    df_metadata = rex.get_rechtspraak_metadata(save_file="n", dataframe=df)
    
    # Get metadata into a DataFrame from a CSV produced by get_rechtspraak
    df_metadata = rex.get_rechtspraak_metadata(save_file="n", filename="rechtspraak.csv")
    
    # Produce metadata CSV from an in-memory DataFrame
    rex.get_rechtspraak_metadata(save_file="y", dataframe=df)
    
    # Produce metadata CSV from files already in data/ (processes all files)
    rex.get_rechtspraak_metadata(save_file="y")
    • filename refers to a file in the data/ folder created by get_rechtspraak.
    • df is the DataFrame returned by get_rechtspraak.

    Extracting metadata via SQLite (offline, faster)

    The SQLite method fetches metadata from a local pre-built database instead of making live API calls. This is significantly faster for large batches and works offline.

    Prerequisite: The rechtspraak-lido-sqlite package must be installed and its database must be built locally before using this method.

    pip install rechtspraak-lido-sqlite

    After installing, follow the rechtspraak-lido-sqlite instructions to build the local database (typically produces a file at data/lido.db or a path you configure).

    Using get_rechtspraak_metadata with method='sqlite'

    import rechtspraak_extractor as rex
    
    df = rex.get_rechtspraak(max_ecli=500, sd="2025-01-01", save_file="n")
    
    # Fetch metadata from local SQLite database
    df_metadata = rex.get_rechtspraak_metadata(
        save_file="n",
        dataframe=df,
        method="sqlite",
        sqlite_db_path="data/lido.db",   # path to the database built by rechtspraak-lido-sqlite
        fallback_to_api=True,            # fall back to live API for ECLIs not found in the DB
    )

    Using fetch_eclis_via_sqlite directly

    from rechtspraak_extractor.rechtspraak_metadata import fetch_eclis_via_sqlite
    
    eclis = ["ECLI:NL:HR:2023:1", "ECLI:NL:HR:2023:2"]
    
    columns = ["ecli", "document_type", "date_decision", "instance", "full_text"]
    
    df = fetch_eclis_via_sqlite(
        ecli_list=eclis,
        sqlite_db_path="data/lido.db",
        columns=columns,
    )

    Note: If the database file does not exist or an ECLI is not found in it, fetch_eclis_via_sqlite returns an empty DataFrame rather than raising an error. Use fallback_to_api=True in get_rechtspraak_metadata to automatically cover missing ECLIs via the live API.

    License

    License: Apache 2.0

    Previously under the MIT License, as of 28/10/2022 this work is licensed under a Apache License, Version 2.0.

    Apache License, Version 2.0
    
    Copyright (c) 2022 Maastricht Law & Tech Lab
    
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
        
        http://www.apache.org/licenses/LICENSE-2.0
    
    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    

    About

    No description, website, or topics provided.

    Resources

    License

    Code of conduct

    Stars

    Watchers

    Forks

    Packages

     
     
     

    Contributors

    Languages