| title | K-GAP Documentation |
|---|---|
| nav_order | 1 |
Knowledge Graph Analysis Platform
K-GAP is a microservices-based platform for building, managing, and analyzing knowledge graphs using SPARQL and linked data technologies.
- Workflow Guide - Step-by-step book-style workflow
- Configuration Guide - Complete environment and config reference
- Quick Reference - Common commands and patterns
- FAQ - Frequently asked questions
- Advanced Topics - Advanced usage patterns
- GitHub Pages Setup - Publishing this documentation
Follow this documentation in order as a user guide:
- Getting Started & Platform Overview - Understand architecture and deploy K-GAP
- Workflow Guide - Execute the end-to-end K-GAP workflow step by step
- Configuration Guide - Complete environment variables and config reference
- Component Guides - Deep-dive into each service
- Quick Reference - Copy/paste commands for daily use
- Advanced Topics - Optimization and advanced patterns
- FAQ - Troubleshooting and common questions
K-GAP (Knowledge Graph Analysis Platform) is designed to provide a comprehensive, containerized environment for working with knowledge graphs. It combines several specialized microservices that work together to:
- Store and query RDF data using GraphDB
- Harvest and ingest data from LDES (Linked Data Event Streams) feeds
- Analyze and process knowledge graphs using Python tools (Sembench)
- Explore data interactively through Jupyter notebooks
- Microservices Architecture: Each component runs as an independent Docker container
- LDES Integration: Automated harvesting from multiple Linked Data Event Streams
- Interactive Analysis: Jupyter notebooks for data exploration and visualization
- Scalable Storage: GraphDB repository with configurable resources
- Automated Processing: Scheduled data processing pipelines via Sembench
K-GAP follows a microservices architecture pattern where each component is:
- Packaged as a Docker container
- Independently deployable
- Connected through a shared Docker network
- Configured via environment variables
┌─────────────────────────────────────────────────────────────┐
│ K-GAP Platform │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Jupyter │───────▶│ GraphDB │ │
│ │ Notebooks │ │ Repository │ │
│ └──────────────┘ └──────────────┘ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Sembench │───────▶│ LDES Consumer│ │
│ │ Processing │ │ (spawns) │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ ldes2sparql │ │
│ │ containers │ │
│ └───────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ YASGUI │──────▶ GraphDB SPARQL Endpoint │
│ │ Web UI │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
- Ingestion: LDES Consumer harvests data from external LDES feeds and ingests into GraphDB
- Storage: GraphDB stores RDF triples in a SPARQL-queryable repository
- Processing: Sembench runs scheduled tasks to process and transform data
- Analysis: Jupyter notebooks query and analyze the knowledge graph
- Exploration: YASGUI provides a web interface for SPARQL queries
K-GAP consists of four main Docker images and one optional web UI:
GraphDB is the core RDF triple store that provides:
- SPARQL 1.1 query endpoint
- Repository management
- Full-text search indexing
- REST API access
Base Image: ontotext/graphdb:10.4.4
Port: 7200 (HTTP)
Documentation: GraphDB Component
Interactive notebook environment for data analysis:
- Pre-installed Python packages for RDF/SPARQL
- Access to GraphDB endpoint
- Template notebooks for common tasks
- Shared volumes for data and notebooks
Base Image: jupyter/base-notebook
Port: 8889 (mapped to internal 8888)
Documentation: Jupyter Component
Python-based semantic processing engine:
- Scheduled data processing tasks
- Integration with py-sema library
- Configurable processing pipelines
- Automated workflows
Base Image: python:3.10
Documentation: Sembench Component
Multi-feed LDES harvesting service:
- Wraps ldes2sparql
- Spawns separate containers for each LDES feed
- Configurable polling intervals
- Automatic restart on failure
Base Image: python:3.10-slim
Documentation: LDES Consumer Component
Web-based SPARQL query interface:
- Visual query builder
- Results visualization
- Query history
- NOT built from this repository (uses
redpencil/yasgui:latest)
Port: 8080
- Docker (version 20.10 or higher)
- Docker Compose (version 2.0 or higher)
- At least 16GB RAM recommended
- 20GB free disk space
-
Clone the repository:
git clone https://github.com/vliz-be-opsci/k-gap.git cd k-gap -
Configure environment:
cp dotenv-example .env # Edit .env to customize settings -
Create data directories:
mkdir -p ./data mkdir -p ./notebooks
-
Start the platform:
docker compose up -d
-
Access services:
- GraphDB Workbench: http://localhost:7200
- Jupyter Notebooks: http://localhost:8889
- YASGUI: http://localhost:8080
To build all Docker images locally:
make docker-buildThis builds images with the default tag. To specify a custom tag:
make BUILD_TAG=0.2.0 docker-buildTo build and push images to a container registry:
make REG_NS=ghcr.io/vliz-be-opsci/kgap docker-pushK-GAP is configured through environment variables defined in a .env file.
# Docker Compose
COMPOSE_PROJECT_NAME=kgap
# GraphDB Configuration
GDB_REPO=kgap # Repository name
REPOLABEL=label_repo_here # Repository label
GDB_HOME_FOLDER=/opt/graphdb/home
GDB_MAX_HEADER=65536
GDB_JAVA_OPTS="-Xms8g -Xmx16g -Dcom.ontotext.graphdb.monitoring.jmx=true"
# Jupyter Configuration
SRC_FOLDER=/kgap/notebooks
# Sembench Configuration
SEMBENCH_INPUT_PATH=/data
SEMBENCH_OUTPUT_PATH=/data
SEMBENCH_HOME_PATH=/data
SEMBENCH_CONFIG_PATH=/data/sembench.yaml
SCHEDULER_INTERVAL_SECONDS=86400 # 24 hours
# LDES Consumer Configuration
LDES_CONFIG_FILE=/data/ldes-feeds.yaml
LDES2SPARQL_IMAGE=ghcr.io/maregraph-eu/ldes2sparql:latest
LOG_LEVEL=INFOThe GraphDB repository is automatically configured on first startup using the template at graphdb/kgap/template-repo-config.ttl. Key settings:
- Base URL:
http://example.org/owlim# - Entity Index Size: 10,000,000
- Full-Text Search: Enabled
- Ruleset: Empty (no inference by default)
- Context Index: Disabled
- Predicate List: Enabled
To customize the repository configuration, modify the template before starting GraphDB.
Create a data/ldes-feeds.yaml file to configure LDES feed harvesting:
feeds:
- name: my-feed
url: https://example.com/ldes-endpoint
sparql_endpoint: http://graphdb:7200/repositories/kgap/statements
polling_interval: 60 # seconds
environment:
# Optional additional variables
CUSTOM_VAR: valueSee LDES Consumer Documentation for details.
Create a data/sembench.yaml file to configure processing pipelines. Refer to the py-sema documentation for configuration schema.
Using YASGUI (Web Interface):
- Navigate to http://localhost:8080
- Enter your SPARQL query
- Execute and visualize results
Using Jupyter Notebooks:
from kgap_tools import execute_to_df
# Execute a SPARQL query and get results as a DataFrame
df = execute_to_df('my_query', var1='value1', var2='value2')- Edit
data/ldes-feeds.yamlto add/remove feeds - Restart the LDES consumer:
docker compose restart ldes-consumer
View logs for a specific service:
docker compose logs -f graphdb
docker compose logs -f jupyter
docker compose logs -f sembench
docker compose logs -f ldes-consumerView logs for an LDES feed container:
docker logs ldes-consumer-{feed-name}Stop all services:
docker compose downRemove all containers and clean up:
make docker-stop
make docker-cleank-gap/
├── docker-compose.yml # Service orchestration
├── Makefile # Build and deployment tasks
├── .env # Environment configuration
├── data/ # Shared data directory
├── notebooks/ # Jupyter notebooks
├── docs/ # Documentation
│ ├── index.md # This file
│ └── components/ # Component-specific docs
├── graphdb/ # GraphDB image
│ ├── Dockerfile
│ └── kgap/
│ ├── entrypoint-wrap.sh
│ ├── healthy.sh
│ └── template-repo-config.ttl
├── jupyter/ # Jupyter image
│ ├── Dockerfile
│ └── kgap/
│ ├── entrypoint-wrap.sh
│ ├── requirements.txt
│ └── notebooks/
│ ├── kgap_tools.py
│ └── kgap_template.ipynb
├── sembench/ # Sembench image
│ ├── Dockerfile
│ └── kgap/
│ ├── main.py
│ └── requirements.txt
└── ldes-consumer/ # LDES Consumer image
├── Dockerfile
├── README.md
├── ldes-feeds.yaml.example
└── kgap/
├── entrypoint.sh
├── spawn_instances.py
├── logger.py
└── requirements.txt
- Create a new directory:
{component}/ - Add a
Dockerfile - Add component files under
{component}/kgap/ - Update
docker-compose.ymlto include the service - Update
MakefileDIMGS variable - Create documentation in
docs/components/{component}.md
See the main repository for contribution guidelines.
For advanced usage patterns and concepts, see:
- Advanced Topics Guide
- Assertion paths and dereferencing patterns
- Custom SPARQL query templates
- Data validation patterns
- Performance optimization
- Multi-repository setup
- py-sema: Python semantic processing library used by Sembench
- ldes2sparql: LDES harvesting tool
- GraphDB: RDF database
- Jupyter: Interactive computing environment
K-GAP is licensed under the MIT License. See LICENSE for details.
For issues and questions:
- GitHub Issues: https://github.com/vliz-be-opsci/k-gap/issues
- Organization: https://github.com/vliz-be-opsci
This documentation is designed to be published on GitHub Pages. See GitHub Pages Setup Guide for instructions on publishing this documentation as a website.