A comprehensive glossary of legal, technical, and domain-specific terms used in the Case Law Explorer project.
A uniform identifier system for European case law. Format: ECLI:CountryCode:Court:Year:Ordinal
Example: ECLI:NL:HR:2023:1234
NL= NetherlandsHR= Hoge Raad (Supreme Court)2023= Year1234= Sequential number
Purpose: Enables unique identification and citation of court decisions across European jurisdictions.
Resources: Official ECLI website
A unique identifier for EU legal documents, including court cases. Format varies by document type.
Example for Court Case: 62019CJ0100
6= Case law2019= YearCJ= Court of Justice0100= Case number
Purpose: Identifies all EU legal documents in the EUR-Lex database.
Resources: EUR-Lex CELEX info
The official Dutch abbreviation for the "Basic Laws Database" - the official publication of Dutch laws and regulations.
Example: BWBR0001854 (Dutch Civil Code)
BWB= Basic Laws DatabaseR= Regulation0001854= Sequential number
Purpose: Unique identification of Dutch legislation.
Resources: wetten.overheid.nl
Dutch term meaning "case law" or "jurisdiction". Refers to:
- The Dutch Judiciary (organization)
- The website rechtspraak.nl hosting Dutch court decisions
- Dutch case law in general
Purpose: Primary source for Dutch court decisions in this project.
Resources: rechtspraak.nl
"Linked Data Government" - A Dutch government initiative providing linked open data about legislation and case law.
Services:
- Citation extraction between cases
- References to legal provisions
- BWB ID mapping
- Legal provision aliases
Data Format: RDF/Turtle files with linked data
Resources: linkeddata.overheid.nl
International court established by the European Convention on Human Rights.
Jurisdiction: Human rights violations in Council of Europe member states
Database: HUDOC (Human Rights Documentation)
Case Identifier: Uses both ECLI and ItemID systems
Resources: echr.coe.int
The EU's highest court, interpreting EU law and ensuring uniform application.
Components:
- Court of Justice (CJ)
- General Court (GC)
Database: CELLAR (Central Repository for Legal Documents)
Case Identifier: Uses both ECLI and CELEX
Resources: curia.europa.eu
The Central Repository for Legal Documents of the EU Publications Office.
Contains:
- EU legislation
- Court of Justice case law
- Parliamentary documents
- Official Journal
Access: SPARQL endpoint for programmatic access
Resources: publications.europa.eu
A graph representation of how court cases reference each other.
Nodes: Individual court cases (identified by ECLI)
Edges: Citation relationships between cases
- Outgoing: Cases cited by this case
- Incoming: Cases that cite this case
Purpose: Enables network analysis, identifying influential cases, citation patterns, and legal doctrine development.
A specific article, paragraph, or section of a law or regulation that is referenced in a court decision.
Example: "Article 1:1 BW" (Article 1:1 of the Dutch Civil Code)
Components:
- Law identifier (e.g., BW, Sr)
- Structural elements (Book, Title, Chapter)
- Article number
- Sometimes paragraph and sub-paragraph
A standard for unambiguous referencing of legal sources in the Netherlands.
Format: jci1.3:c:BWBR0001854&artikel=1
jci1.3= Versionc= Type (c=current version)BWBR0001854= BWB IDartikel=1= Article 1
Purpose: Precise linking to specific parts of legal texts.
A data pipeline pattern consisting of three stages:
- Extract: Retrieve data from source systems
- Transform: Clean, normalize, and structure data
- Load: Store data in target system (database, data warehouse)
In this project: Extract from rechtspraak.nl/HUDOC/CELLAR → Transform to unified format → Load to DynamoDB/S3
An open-source platform for orchestrating complex data workflows.
Key Concepts:
- DAG (Directed Acyclic Graph): Workflow definition
- Task: Individual unit of work
- Operator: Template for a task type
- Scheduler: Triggers DAGs based on schedule
- Worker: Executes tasks
UI: Web interface at http://localhost:8080
Resources: airflow.apache.org
In Airflow context: A collection of tasks with dependencies, representing a workflow.
Directed: Tasks have a defined order (A → B → C)
Acyclic: No circular dependencies (prevents infinite loops)
Example: Extract → Transform → Load
In Airflow: A way to organize related tasks together for better visualization and management.
In this project: Monthly task groups for parallel processing of different time periods.
AWS's fully managed NoSQL database service.
Data Model: Key-value and document database
Key Features:
- Automatic scaling
- High availability
- Flexible schema
- Fast performance
In this project: Stores case metadata with three tables (ECLI-based, CELEX-based, ItemID-based)
Resources: aws.amazon.com/dynamodb
AWS's object storage service.
Use Cases: Files, backups, data lakes, static websites
In this project: Stores full text of court decisions and graph data (nodes/edges)
Resources: aws.amazon.com/s3
A query language for APIs that allows clients to request exactly the data they need.
Advantages:
- Request only needed fields
- Single endpoint
- Strongly typed
- Self-documenting
In this project: Optional AppSync API for querying case law data
Resources: graphql.org
AWS's managed GraphQL service.
Features:
- Automatic schema generation
- Real-time subscriptions
- Built-in authentication
- DynamoDB integration
In this project: Provides GraphQL API for querying case data
Resources: aws.amazon.com/appsync
Platform for developing, shipping, and running applications in containers.
Container: Lightweight, standalone, executable package including code, runtime, libraries, and dependencies.
In this project: Airflow and all dependencies run in Docker containers
Resources: docker.com
Tool for defining and running multi-container Docker applications.
In this project: Orchestrates 6+ services (Airflow webserver, scheduler, worker, PostgreSQL, Redis, etc.)
Configuration: docker-compose.yaml file
Resources: docs.docker.com/compose
Query language for RDF (Resource Description Framework) data.
Purpose: Query linked data and semantic web resources
In this project: Used to query CELLAR database for CJEU cases
Example Query:
SELECT ?case ?date WHERE {
?case a cdm:judgment .
?case cdm:judgmentDate ?date .
}Resources: w3.org/TR/sparql11-query
Standard model for data interchange on the web.
Structure: Triples (Subject-Predicate-Object)
Example:
<case123> <hasDate> "2023-01-01"
Subject Predicate Object
In this project: LIDO data is in RDF format
A textual syntax for RDF that is more compact and readable than XML.
Example:
@prefix ecli: <http://data.europa.eu/eli/ontology#> .
<http://example.org/case/1>
ecli:date "2023-01-01" ;
ecli:court "Supreme Court" .In this project: LIDO exports are in Turtle format (.ttl.gz)
Distributed task queue for Python.
Components:
- Broker: Message queue (Redis in this project)
- Worker: Executes tasks
- Result Backend: Stores task results
In this project: Airflow uses Celery to distribute tasks across multiple workers
Resources: docs.celeryproject.org
Open-source relational database management system.
Features:
- ACID compliant
- Advanced SQL features
- Extensible
- Reliable
In this project:
- Airflow metadata database
- LIDO legal provisions storage
Resources: postgresql.org
In-memory data structure store used as database, cache, and message broker.
In this project: Celery message broker for Airflow task queue
Resources: redis.io
Python library for data analysis and manipulation.
Key Features:
- DataFrame data structure
- Reading/writing various formats (CSV, JSON, SQL)
- Data cleaning and transformation
- Statistical analysis
In this project: Core library for data transformation phase
Resources: pandas.pydata.org
AWS SDK for Python.
Purpose: Interact with AWS services programmatically
In this project: Used to:
- Upload data to S3
- Write to DynamoDB
- Query AWS resources
Resources: boto3.amazonaws.com
First phase of ETL: Retrieving data from source systems.
In this project:
- Rechtspraak: Download XML dumps, call metadata API
- ECHR: Query HUDOC API
- CJEU: Query CELLAR SPARQL endpoint
- Citations: Query LIDO API
Output: Raw CSV/JSON files in data/raw/{date}/
Second phase of ETL: Cleaning and normalizing data.
Operations in this project:
- Column name normalization
- Data type conversions
- XML/HTML parsing
- ECLI validation
- Citation structuring
- Duplicate removal
Output: Clean CSV files in data/processed/{date}/
Third phase of ETL: Storing data in target systems.
In this project:
- Metadata → DynamoDB tables
- Full text → S3 buckets
- Graph data → S3 buckets
- Legal provisions → PostgreSQL
Extracting only new or changed data since the last extraction.
Method: Using date ranges (start_date, end_date parameters)
Benefits:
- Faster processing
- Lower resource usage
- Reduced API calls
Executing multiple tasks simultaneously.
In this project: Monthly task groups process different months in parallel
Benefits:
- Faster overall processing
- Better resource utilization
- Independent month processing
Property where repeated operations produce the same result.
Example: Re-running extraction for a month doesn't create duplicates
Implementation: Check if data exists before extraction
Organization pattern where extracted data is stored in directories by month.
Format: data/raw/YYYY-MM-DD/
Example: data/raw/2023-01-01/
Purpose:
- Parallel processing
- Easy reprocessing of specific periods
- Clear data organization
Process of extracting structured information about a case (not the full text).
Includes:
- Case identifiers (ECLI, CELEX)
- Court name
- Decision date
- Procedure type
- Subject matter
- Citations
Process of extracting the complete text of court decisions.
Storage: JSON format in S3
Structure:
{
"ecli": "ECLI:NL:HR:2023:1234",
"full_text": "Court decision text...",
"sections": {...}
}Process of identifying and structuring references between cases and to legal provisions.
Sources:
- LIDO API (primary)
- Text parsing (secondary)
Output:
- Incoming citations (cases citing this case)
- Outgoing citations (cases cited by this case)
- Legal provisions cited
Representation of cases and citations as network nodes and edges.
Nodes File (cellar_nodes.txt):
ECLI:NL:HR:2023:1234|Supreme Court|2023-01-01
Edges File (cellar_edges.txt):
ECLI:NL:HR:2023:1234|ECLI:NL:HR:2022:5678
Purpose: Network analysis and visualization
| Acronym | Full Name | Category |
|---|---|---|
| AWS | Amazon Web Services | Technology |
| BW | Burgerlijk Wetboek (Civil Code) | Legal |
| BWB | Basis Wetten Bestand | Legal |
| CELEX | Common European Legal X-change | Legal |
| CJEU | Court of Justice of the European Union | Legal |
| DAG | Directed Acyclic Graph | Technology |
| DDB | DynamoDB | Technology |
| ECHR | European Court of Human Rights | Legal |
| ECLI | European Case Law Identifier | Legal |
| ETL | Extract, Transform, Load | Technology |
| EU | European Union | Legal |
| JCI | Juriconnect | Legal |
| LIDO | Linked Data Overheid | Legal |
| RDF | Resource Description Framework | Technology |
| S3 | Simple Storage Service | Technology |
| SPARQL | SPARQL Protocol and RDF Query Language | Technology |
| SQL | Structured Query Language | Technology |
| TTL | Turtle (file format) | Technology |
| UI | User Interface | Technology |
| URL | Uniform Resource Locator | Technology |
- General Documentation: See README.md
- Architecture: See ARCHITECTURE.md
- Troubleshooting: See TROUBLESHOOTING.md
- ETL Guide: See docs/etl/README.md
- Datasets: See docs/datasets/README.md
Last Updated: October 2025