Skip to content

davidlamprecht/AutoRDF2GML

Repository files navigation

🧩 AutoRDF2GML

AutoRDF2GML is a framework designed to convert RDF data into graph representations suitable for graph-based machine learning (GML) methods, such as Graph Neural Networks (GNNs). By generating both content-based features from RDF datatype properties and topology-based features from RDF object properties, AutoRDF2GML enables effective integration of Semantic Web technologies with Graph Machine Learning.


🌟 Key Features

  • Content-Based Node Features: Automatically extract node features from RDF datatype properties.
  • Topology-Based Edge Features: Derive edge features from RDF object properties.
  • User-Friendly Interface: Modular design with automatic feature selection for simplicity and ease of use.
  • Graph ML Integration: Seamlessly integrates with leading frameworks like PyTorch Geometric and DGL.

Overview of AutoRDF2GML


📥 Installation via pip

AutoRDF2GML is now available via pip! To install, simply run:

pip install autordf2gml

For detailed usage instructions, check https://pypi.org/project/autordf2gml/.


Quick User Guide

For a step-by-step guide on using the framework, see our example and example-topologyfeatures directories.

Usage

To start using AutoRDF2GML, you need an (1) RDF file and (2) config file describing the configuration for the transformation. In the config file, define the RDF classes and properties as needed for your project. Once configured, execute the AutoRDF2GML script to generate a heterogeneous graph dataset suitable for your machine learning applications. For a step-by-step guide, see our example and example-topologyfeatures directories.

The output can then be used for various machine learning tasks, including node classification, link prediction, and graph classification. It can be readily integrated into common graph machine learning frameworks. For example, see how the output from AutoRDF2GML can be loaded into a PyTorch Geometric HeteroData object in this script. For instance, the structure of the loaded PyG HeteroData object is available as a directed graph here and as an undirected graph here.

Feature Configuration

Content-based Node Features

Quick example for Content-based Node Features Transformation: example

AutoRDF2GML with content-based node features is implemented in the Python script autordf2gml-cb.py. The related template and documentation of the configuration file is defined in the config-template.ini file. The default model for calculating the embeddings based on the natural language descriptions is SciBERT, but also other huggingface BERT variant models (e.g., bert-base) can be used.

Topology-based Node Features

Quick example for Topology-based Node Features Transformation: example-topologyfeatures directory.

AutoRDF2GML with topology-based node features is implemented in the Python script autordf2gml-tb.py. The related template and documentation of the configuration file is defined in the config-template.ini file. The following KG embedding models are possible for calculating the topology-based feature: TransE, DistMult, ComplEx, RotatE. The default parameters (hidden channel size 128) are defined and commented in the implementation.

🤝 Contributing

We welcome any kind of contributions!

📄 License

AutoRDF2GML is available under the MIT License, making it open and accessible for both personal and commercial use.

GML Datasets

📞 Contact & Reference

Michael Färber, David Lamprecht, Yuni Susanti: "AutoRDF2GML: Facilitating RDF Integration in Graph Machine Learning", Proceedings of the 23rd International Semantic Web Conference (ISWC'24), Baltimore, USA.

About

AutoRDF2GML is a novel framework that semi-automatically transforms RDF data into heterogeneous graph datasets suitable for graph-based machine learning such as graph neural network (GNNs).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages