Skip to content

blw-ofag-ufag/plant-protection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

130 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Warning

This project represents a PoC and should not be consumed by productive applications. It is currently being refactored at https://github.com/BLV-OSAV-USAV/PSMV-RDF. Head over there for more information about the availability of plant protection product data on LINDAS.

Plant Protection Products as Linked Data

This project extracts the Swiss Plant Protection Product (PPP) registry, maps the data to RDF and publishes it on LINDAS. The ETL logic lives in automation/etl.R and uses a few CSV files in tables/mapping for manual mappings such as company identifiers or product categories.

image

A couple of small demonstration pages are available in the docs folder and are hosted via GitHub Pages:

These sites illustrate how linked data from LINDAS can be embedded in a website and are not meant as full fledged applications.

ETL pipeline

The main R script, etl.R, initiates the process by downloading the Swiss Plant Protection Product Registry data as an XML file. It then parses this XML and transforms the relevant elements—such as products, companies, and codes—into RDF triples using a custom ontology.

The ETL pipeline incorporates manually curated mapping tables in CSV and JSON formats. These tables are used to align internal classes and identifiers with established external ones, such as mapping company IDs to their ZEFIX registry entries or mapping product categories to subclasses of product in the ontology.

The Python script reason.py plays a crucial role in preparing the data for publication. It performs the following actions:

  1. Merges all the generated and external RDF files into a single graph.
  2. Applies RDFS and OWL reasoning to infer new relationships. This includes expanding class hierarchies (rdfs:subClassOf), property hierarchies (rdfs:subPropertyOf), and reciprocal relationships (owl:inverseOf).
  3. Deterministically sorts the triples to ensure consistency between versions.
  4. Serializes the final, consolidated graph into a single Turtle file (graph.ttl).

Here's a graphical overview of these steps:

sequenceDiagram

    autonumber

    participant FSVO as FSVO website
    participant UploadScript as Upload Script (upload.sh)
    participant ETL_Pipeline as ETL Pipeline (etl.R)
    participant Repo as Repository
    participant ReasoningScript as Reasoning Script (reason.py)
    participant LINDAS as LINDAS Platform

    UploadScript->>ETL_Pipeline: Trigger ETL pipeline

    activate ETL_Pipeline
        ETL_Pipeline->>FSVO: Loads FSVO XML
        ETL_Pipeline->>Repo: Reads mapping tables
        loop For each class individually
            ETL_Pipeline->>ETL_Pipeline: Parses XML object
            ETL_Pipeline->>ETL_Pipeline: Integrates mappings
            ETL_Pipeline->>Repo: Writes n-triple<br>or turtle RDF files
        end
    deactivate ETL_Pipeline

    UploadScript->>ReasoningScript: Trigger reasoning pipeline
    activate ReasoningScript
        ReasoningScript->>Repo: Loads `.ttl` files<br>(`ontology.ttl`, foreign triples<br>from `rdf/foreign/*.ttl`, and manual<br>mappings from `rdf/mapping/*.ttl`)
        ReasoningScript->>ReasoningScript: Merges all RDF data
        ReasoningScript->>ReasoningScript: Performs RDFS/OWL reasoning<br>(subclass, subproperty, inverseOf)
        ReasoningScript->>Repo: Reads, sorts and writes<br>all `.ttl` files
    deactivate ReasoningScript

    UploadScript->>LINDAS: Clears the existing graph
    UploadScript->>LINDAS: Uploads the new `graph.ttl`
Loading

All of the provided example/demonstration webpages query the data directly via the LINDAS SPARQL endpoint.

Querying the dataset

The resulting RDF is loaded into the graph <https://lindas.admin.ch/foag/plant-protection> on the public LINDAS SPARQL endpoint at https://lindas.admin.ch/query. SPARQL is the query language for RDF datasets.

Note

The following examples can be opened directly in your browser via the s.zazuko.com shortener. Via the links, you can view the SPARQL query, edit it yourself and query the LINDAS triple store however you wish.

Example queries

PREFIX schema: <http://schema.org/>
PREFIX : <https://agriculture.ld.admin.ch/plant-protection/>

SELECT
  ?company
  (GROUP_CONCAT(CONCAT(?name, " (", ?WNbr, ")"); separator=", ") AS ?Product)
  (COUNT(?product) AS ?Number)

WHERE
{
  ?product schema:name ?name ;
    :hasPermissionHolder/schema:legalName ?company ;
    :federalAdmissionNumber ?WNbr ;
    :indication [
      :cropGroup/schema:name "Kartoffeln"@de ;
      :cropStressor/schema:name "Kraut- und Knollenfäule"@de
        ] .
}

GROUP BY ?company
ORDER BY DESC(?Number)
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <https://agriculture.ld.admin.ch/plant-protection/>

SELECT ?label ?comment

WHERE
{
  ?class rdfs:subClassOf* :Product ;
    schema:name ?label ;
    schema:description ?comment .

  VALUES ?lang { "en" }
  FILTER (
    LANG(?label) = ?lang &&
    LANG(?comment) = ?lang
  )
}

ORDER BY ?class
PREFIX : <https://agriculture.ld.admin.ch/plant-protection/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wdt:  <http://www.wikidata.org/prop/direct/>

SELECT DISTINCT ?product ?name ?company

WHERE
{
  # search SRPPP data for products and bind wikidataTaxon + chebi keys
  ?product schema:name ?name ;
    :hasPermissionHolder/schema:legalName ?company ;
    :indication/:cropStressor/:isDefinedByBiologicalTaxon ?wikidataTaxon ;
    :hasComponentPortion/:substance/(:hasChebiIdentity|:partialChebiIdentity) ?chebi .
  
  # query Wikidata to only select taxa that are a subgroup of "Insecta" --> i.e. that are insects
  SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata>
  {
    ?wikidataTaxon wdt:P171*/wdt:P225 "Insecta" .
  }
  
  # query RHEA/ChEBI for chemical entities that have the role "neurotoxin"
  SERVICE <https://sparql.rhea-db.org/sparql/> {
    ?chebi rdfs:subClassOf/owl:someValuesFrom/rdfs:label "neurotoxin" .
  }
}

Other queries

About

A pipeline to convert the Swiss registry of plant protection products to RDF linked data.

Resources

Stars

Watchers

Forks

Contributors