Skip to content

MaRDI4NFDI/MathAlgoDB-Importer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MathAlgoDB Importer

Import data from MathAlgoDB into the MaRDI Portal.

The script parses an RDF/XML export of MathAlgoDB and creates/updates items (algorithms, problems, software, benchmarks) on the MaRDI portal, including their properties and inverse relations.

Prerequisites

  • Python >= 3.11
  • uv (handles dependencies automatically)

No manual installation of dependencies is needed. Running the script with uv run will automatically install rdflib and mardiclient.

Project structure

importer/
├── import_mathalgodb.py              # Main script
├── pyproject.toml
├── README.md
└── mappings/
    ├── staging/
    │   ├── config.json               # Staging environment configuration
    │   └── mardi.json                # MathAlgoDB ID -> QID mapping (staging)
    └── production/
        ├── config.json               # Production environment configuration
        └── mardi.json                # MathAlgoDB ID -> QID mapping (production)

Configuration

Each environment has a config.json in mappings/{environment}/ with:

Key Description
wikibase_host Portal hostname (e.g. staging.mardi4nfdi.org). API endpoints are derived from this.
instance_mapping Maps entity types (problem, software, algorithm, benchmark) to their "instance of" QIDs.
profile_mapping Maps entity types to their MaRDI profile type QIDs.
property_mapping Maps MathAlgoDB relation names (e.g. solvedBy, subclassOf) and identifier types (e.g. DOI, arxiv, swmath) to Wikibase property IDs.
qualifier_mapping Maps relation names that are stored as qualifier-based claims (e.g. documentedIn, analyzedIn) to the QID used as the qualifier value.
community_item QID of the MathAlgoDB community item.
instance_of_property Property ID for "instance of" claims.
profile_type_property Property ID for MaRDI profile type claims.
community_property Property ID for community claims.
mathalgodb_identifier_property Property ID for the MathAlgoDB identifier string claim.
object_has_role_property Property ID used as the qualifier property for qualifier-based claims.
documented_in_property Property ID under which qualifier-based claims are stored (the main property of the qualified statement).

Mapping file (mardi.json)

Each environment directory contains a mardi.json file that maps MathAlgoDB individual IDs to Wikibase QIDs:

{
  "al:BICGSTABl": "Q6825304",
  "al:ClassDirectDense": "Q6825303",
  "pr:TangentFromPointToCircle": "Q6825307",
  "sw:polymake": "Q6825333",
  "pb:FerN83": "Q4745688",
  ...
}

This file is read at startup to determine which items already exist. When new items are created, the file is updated automatically.

Environment variables

Variable Required Description
WIKIBASE_USER Yes (unless --dry-run) Wikibase bot username
WIKIBASE_PASSWORD Yes (unless --dry-run) Wikibase bot password

These can be provided via a .env file in the project directory. Copy the template and fill in your values:

cp .env.example .env
WIKIBASE_USER=MyBot
WIKIBASE_PASSWORD=secret

Usage

Dry run (no credentials needed)

Downloads the latest XML and shows what would be created/updated without writing anything:

uv run import_mathalgodb.py --dry-run

Import to staging

uv run import_mathalgodb.py -e staging

Import to production

uv run import_mathalgodb.py -e production

Use a local XML file (skip download)

uv run import_mathalgodb.py --xml-file path/to/mathalgodb.xml

What the script does

The script runs three steps sequentially:

  1. Create items -- For each non-publication individual in the XML, checks mardi.json to see if it already exists. If not, creates a new Wikibase item with label, aliases, description, instance-of, profile type, community, and MathAlgoDB identifier claims. Updates mardi.json with the new QID.

  2. Add properties -- Resolves relations between individuals to QIDs using mardi.json and the environment's property_mapping, then writes them as claims. Relations listed in qualifier_mapping (e.g. documentedIn, analyzedIn) are stored as qualified statements: a claim under documented_in_property with the target item as value and the role QID as a qualifier on object_has_role_property. Identifier strings (DOI, arXiv, swMath, etc.) are written as literal claims using the property IDs also defined in property_mapping.

  3. Add inverse relations -- For relations that have an inverse (e.g. solvessolvedBy, documentsdocumentedIn), adds the corresponding claims to the target items. Inverse relations that map to qualifier_mapping entries are likewise stored as qualified statements on the target item.

CLI reference

usage: import_mathalgodb.py [-h] [-e {staging,production}] [--xml-file XML_FILE] [--dry-run]

options:
  -e, --environment     Target environment (default: 'staging')
  --xml-file            Path to a local RDF/XML file (skips download)
  --dry-run             Parse and prepare data without writing to Wikibase

About

Script to sync MathAlgoDB entities (algorithms, problems, software, benchmarks) into a MaRDI Wikibase instance via the MaRDI client.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages