Skip to content

jmillanacosta/rdfsolve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

RDFSolve

Tests PyPI PyPI - Python Version PyPI - License Documentation Status

Extract RDF schemas from SPARQL endpoints and convert to multiple formats (VoID, LinkML, JSON-LD).

Installation

uv pip install rdfsolve

Quick Start

CLI

Extract schema and convert to multiple formats:

# Discover existing VoID metadata (fast)
rdfsolve discover --endpoint https://sparql.rhea-db.org/sparql

# Extract schema (uses discovered VoID if available)
rdfsolve extract --endpoint https://sparql.rhea-db.org/sparql \
  --output-dir ./output

# Export to different formats
rdfsolve export --void-file ./output/void_description.ttl \
  --format all --output-dir ./output

Extract Command Options:

# Force fresh generation (bypasses discovered VoID)
rdfsolve extract --endpoint URL --force-generate

# Custom naming and URIs
rdfsolve extract --endpoint URL \
  --dataset-name mydata \
  --void-base-uri "http://example.org/mydata/well-known/void"

# Filter specific graphs
rdfsolve extract --endpoint URL \
  --graph-uri http://example.org/graph1 \
  --graph-uri http://example.org/graph2

Export Formats:

  • csv - Schema patterns table
  • jsonld - JSON-LD representation
  • linkml - LinkML YAML schema
  • shacl - SHACL shapes for RDF validation
  • rdfconfig - RDF-config YAML files (model, prefix, endpoint)
  • coverage - Pattern frequency analysis
  • all - All formats (default)

Export with custom LinkML schema:

rdfsolve export --void-file void_description.ttl \
  --format linkml \
  --schema-name custom_schema \
  --schema-uri "http://example.org/schemas/custom" \
  --schema-description "Custom schema description"

Export SHACL shapes for RDF validation:

# Export closed SHACL shapes (strict validation)
rdfsolve export --void-file void_description.ttl \
  --format shacl \
  --shacl-closed \
  --shacl-suffix Shape

# Export open SHACL shapes (flexible validation)
rdfsolve export --void-file void_description.ttl \
  --format shacl \
  --shacl-open

SHACL (Shapes Constraint Language) shapes define constraints on RDF data and can be used to validate RDF instances against the extracted schema. Closed shapes only allow properties explicitly defined in the schema, while open shapes are more permissive.

Export RDF-config files:

rdfsolve export --void-file void_description.ttl \
  --format rdfconfig \
  --endpoint-url https://sparql.example.org/sparql \
  --graph-uri http://example.org/graph \
  --output-dir ./output

Creates a directory {dataset}_config/ containing:

  • model.yml - Class and property structure
  • prefix.yml - Namespace prefix definitions
  • endpoint.yml - SPARQL endpoint configuration

This structure is required by the rdf-config tool.

Count instances per class:

rdfsolve count --endpoint URL --output counts.csv

Service graph filtering:

By default, extract and count exclude Virtuoso system graphs and well-known URIs. Use --include-service-graphs to include them.

Python API

from rdfsolve.api import (
    generate_void_from_endpoint,
    load_parser_from_graph,
    count_instances_per_class,
    to_shacl_from_file,
    to_rdfconfig_from_file,
)

# Generate VoID from endpoint
void_graph = generate_void_from_endpoint(
    endpoint_url="https://sparql.example.org/",
    graph_uris=["http://example.org/graph"],
    void_base_uri="http://example.org/void",  # Custom partition URIs
)

# Load parser and extract schema
parser = load_parser_from_graph(void_graph)

# Export to different formats
schema_df = parser.to_schema()  # Pandas DataFrame
schema_jsonld = parser.to_jsonld()  # JSON-LD
linkml_yaml = parser.to_linkml_yaml(
    schema_name="my_schema",
    schema_base_uri="http://example.org/schemas/my_schema"
)

# Export to SHACL shapes for validation
shacl_ttl = parser.to_shacl(
    schema_name="my_schema",
    schema_base_uri="http://example.org/schemas/my_schema",
    closed=True,  # Closed shapes for strict validation
    suffix="Shape",  # Append "Shape" to class names
)

# Or use the convenience function
shacl_ttl = to_shacl_from_file(
    "void_description.ttl",
    schema_name="my_schema",
    closed=True,
)

# Export to RDF-config format
rdfconfig = to_rdfconfig_from_file(
    "void_description.ttl",
    endpoint_url="https://sparql.example.org/",
    graph_uri="http://example.org/graph",
)
# Save to {dataset}_config/ directory structure
import os
os.makedirs("dataset_config", exist_ok=True)
with open("dataset_config/model.yml", "w") as f:
    f.write(rdfconfig["model"])
with open("dataset_config/prefix.yml", "w") as f:
    f.write(rdfconfig["prefix"])
with open("dataset_config/endpoint.yml", "w") as f:
    f.write(rdfconfig["endpoint"])

# Count instances per class
class_counts = count_instances_per_class(
    "https://sparql.example.org/",
    graph_uris=["http://example.org/graph"],
)

Features

  • Extract RDF schemas from SPARQL endpoints using VoID partitions
  • Discover existing VoID metadata or generate fresh
  • Export to multiple formats: CSV, JSON-LD, LinkML, SHACL, RDF-config, coverage analysis
  • SHACL shapes generation for RDF data validation
  • RDF-config export for schema documentation (compatible with rdf-config tool)
  • Customizable dataset naming and VoID partition URIs
  • Service graph filtering (excludes Virtuoso system graphs by default)
  • Instance counting per class with optional sampling

Documentation

License

MIT License - see LICENSE for details.

Powered by the Bioregistry

About

Generate and export RDF schemas from endpoints. See dashboard: https://jmillanacosta.github.io/rdfsolve

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •