title	K-GAP Documentation
nav_order	1

K-GAP Documentation

Knowledge Graph Analysis Platform

K-GAP is a microservices-based platform for building, managing, and analyzing knowledge graphs using SPARQL and linked data technologies.

Quick Navigation

Workflow Guide - Step-by-step book-style workflow
Configuration Guide - Complete environment and config reference
Quick Reference - Common commands and patterns
FAQ - Frequently asked questions
Advanced Topics - Advanced usage patterns
GitHub Pages Setup - Publishing this documentation

Book Structure

Follow this documentation in order as a user guide:

Getting Started & Platform Overview - Understand architecture and deploy K-GAP
Workflow Guide - Execute the end-to-end K-GAP workflow step by step
Configuration Guide - Complete environment variables and config reference
Component Guides - Deep-dive into each service
Quick Reference - Copy/paste commands for daily use
Advanced Topics - Optimization and advanced patterns
FAQ - Troubleshooting and common questions

Overview
Architecture
Components
Getting Started
Configuration
Usage
Development

Overview

K-GAP (Knowledge Graph Analysis Platform) is designed to provide a comprehensive, containerized environment for working with knowledge graphs. It combines several specialized microservices that work together to:

Store and query RDF data using GraphDB
Harvest and ingest data from LDES (Linked Data Event Streams) feeds
Analyze and process knowledge graphs using Python tools (Sembench)
Explore data interactively through Jupyter notebooks

Key Features

Microservices Architecture: Each component runs as an independent Docker container
LDES Integration: Automated harvesting from multiple Linked Data Event Streams
Interactive Analysis: Jupyter notebooks for data exploration and visualization
Scalable Storage: GraphDB repository with configurable resources
Automated Processing: Scheduled data processing pipelines via Sembench

Architecture

K-GAP follows a microservices architecture pattern where each component is:

Packaged as a Docker container
Independently deployable
Connected through a shared Docker network
Configured via environment variables

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                        K-GAP Platform                        │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐        ┌──────────────┐                   │
│  │   Jupyter    │───────▶│   GraphDB    │                   │
│  │  Notebooks   │        │  Repository  │                   │
│  └──────────────┘        └──────────────┘                   │
│         │                        ▲                           │
│         │                        │                           │
│         ▼                        │                           │
│  ┌──────────────┐        ┌──────────────┐                   │
│  │   Sembench   │───────▶│ LDES Consumer│                   │
│  │  Processing  │        │   (spawns)   │                   │
│  └──────────────┘        └──────┬───────┘                   │
│                                  │                           │
│                          ┌───────▼───────┐                   │
│                          │ ldes2sparql   │                   │
│                          │  containers   │                   │
│                          └───────────────┘                   │
│                                                               │
│  ┌──────────────┐                                            │
│  │   YASGUI     │──────▶ GraphDB SPARQL Endpoint            │
│  │   Web UI     │                                            │
│  └──────────────┘                                            │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Data Flow

Ingestion: LDES Consumer harvests data from external LDES feeds and ingests into GraphDB
Storage: GraphDB stores RDF triples in a SPARQL-queryable repository
Processing: Sembench runs scheduled tasks to process and transform data
Analysis: Jupyter notebooks query and analyze the knowledge graph
Exploration: YASGUI provides a web interface for SPARQL queries

Components

K-GAP consists of four main Docker images and one optional web UI:

1. GraphDB (`kgap_graphdb`)

GraphDB is the core RDF triple store that provides:

SPARQL 1.1 query endpoint
Repository management
Full-text search indexing
REST API access

Base Image: ontotext/graphdb:10.4.4
Port: 7200 (HTTP)
Documentation: GraphDB Component

2. Jupyter (`kgap_jupyter`)

Interactive notebook environment for data analysis:

Pre-installed Python packages for RDF/SPARQL
Access to GraphDB endpoint
Template notebooks for common tasks
Shared volumes for data and notebooks

Base Image: jupyter/base-notebook
Port: 8889 (mapped to internal 8888)
Documentation: Jupyter Component

3. Sembench (`kgap_sembench`)

Python-based semantic processing engine:

Scheduled data processing tasks
Integration with py-sema library
Configurable processing pipelines
Automated workflows

Base Image: python:3.10
Documentation: Sembench Component

4. LDES Consumer (`kgap_ldes-consumer`)

Multi-feed LDES harvesting service:

Wraps ldes2sparql
Spawns separate containers for each LDES feed
Configurable polling intervals
Automatic restart on failure

Base Image: python:3.10-slim
Documentation: LDES Consumer Component

5. YASGUI (Optional)

Web-based SPARQL query interface:

Visual query builder
Results visualization
Query history
NOT built from this repository (uses redpencil/yasgui:latest)

Port: 8080

Getting Started

Prerequisites

Docker (version 20.10 or higher)
Docker Compose (version 2.0 or higher)
At least 16GB RAM recommended
20GB free disk space

Quick Start

Clone the repository:

git clone https://github.com/vliz-be-opsci/k-gap.git
cd k-gap

Configure environment:

cp dotenv-example .env
# Edit .env to customize settings

Create data directories:
```
mkdir -p ./data
mkdir -p ./notebooks
```
Start the platform:
```
docker compose up -d
```
Access services:
- GraphDB Workbench: http://localhost:7200
- Jupyter Notebooks: http://localhost:8889
- YASGUI: http://localhost:8080

Building Images Locally

To build all Docker images locally:

make docker-build

This builds images with the default tag. To specify a custom tag:

make BUILD_TAG=0.2.0 docker-build

Pushing to Registry

To build and push images to a container registry:

make REG_NS=ghcr.io/vliz-be-opsci/kgap docker-push

Configuration

K-GAP is configured through environment variables defined in a .env file.

Core Configuration

# Docker Compose
COMPOSE_PROJECT_NAME=kgap

# GraphDB Configuration
GDB_REPO=kgap                    # Repository name
REPOLABEL=label_repo_here        # Repository label
GDB_HOME_FOLDER=/opt/graphdb/home
GDB_MAX_HEADER=65536
GDB_JAVA_OPTS="-Xms8g -Xmx16g -Dcom.ontotext.graphdb.monitoring.jmx=true"

# Jupyter Configuration
SRC_FOLDER=/kgap/notebooks

# Sembench Configuration
SEMBENCH_INPUT_PATH=/data
SEMBENCH_OUTPUT_PATH=/data
SEMBENCH_HOME_PATH=/data
SEMBENCH_CONFIG_PATH=/data/sembench.yaml
SCHEDULER_INTERVAL_SECONDS=86400  # 24 hours

# LDES Consumer Configuration
LDES_CONFIG_FILE=/data/ldes-feeds.yaml
LDES2SPARQL_IMAGE=ghcr.io/maregraph-eu/ldes2sparql:latest
LOG_LEVEL=INFO

GraphDB Repository Configuration

The GraphDB repository is automatically configured on first startup using the template at graphdb/kgap/template-repo-config.ttl. Key settings:

Base URL: http://example.org/owlim#
Entity Index Size: 10,000,000
Full-Text Search: Enabled
Ruleset: Empty (no inference by default)
Context Index: Disabled
Predicate List: Enabled

To customize the repository configuration, modify the template before starting GraphDB.

LDES Feeds Configuration

Create a data/ldes-feeds.yaml file to configure LDES feed harvesting:

feeds:
  - name: my-feed
    url: https://example.com/ldes-endpoint
    sparql_endpoint: http://graphdb:7200/repositories/kgap/statements
    polling_interval: 60  # seconds
    environment:
      # Optional additional variables
      CUSTOM_VAR: value

See LDES Consumer Documentation for details.

Sembench Configuration

Create a data/sembench.yaml file to configure processing pipelines. Refer to the py-sema documentation for configuration schema.

Usage

Querying Data with SPARQL

Using YASGUI (Web Interface):

Navigate to http://localhost:8080
Enter your SPARQL query
Execute and visualize results

Using Jupyter Notebooks:

from kgap_tools import execute_to_df

# Execute a SPARQL query and get results as a DataFrame
df = execute_to_df('my_query', var1='value1', var2='value2')

Managing LDES Feeds

Edit data/ldes-feeds.yaml to add/remove feeds
Restart the LDES consumer:
```
docker compose restart ldes-consumer
```

Viewing Logs

View logs for a specific service:

docker compose logs -f graphdb
docker compose logs -f jupyter
docker compose logs -f sembench
docker compose logs -f ldes-consumer

View logs for an LDES feed container:

docker logs ldes-consumer-{feed-name}

Stopping and Cleaning Up

Stop all services:

docker compose down

Remove all containers and clean up:

make docker-stop
make docker-clean

Development

Project Structure

k-gap/
├── docker-compose.yml          # Service orchestration
├── Makefile                    # Build and deployment tasks
├── .env                        # Environment configuration
├── data/                       # Shared data directory
├── notebooks/                  # Jupyter notebooks
├── docs/                       # Documentation
│   ├── index.md               # This file
│   └── components/            # Component-specific docs
├── graphdb/                   # GraphDB image
│   ├── Dockerfile
│   └── kgap/
│       ├── entrypoint-wrap.sh
│       ├── healthy.sh
│       └── template-repo-config.ttl
├── jupyter/                   # Jupyter image
│   ├── Dockerfile
│   └── kgap/
│       ├── entrypoint-wrap.sh
│       ├── requirements.txt
│       └── notebooks/
│           ├── kgap_tools.py
│           └── kgap_template.ipynb
├── sembench/                  # Sembench image
│   ├── Dockerfile
│   └── kgap/
│       ├── main.py
│       └── requirements.txt
└── ldes-consumer/            # LDES Consumer image
    ├── Dockerfile
    ├── README.md
    ├── ldes-feeds.yaml.example
    └── kgap/
        ├── entrypoint.sh
        ├── spawn_instances.py
        ├── logger.py
        └── requirements.txt

Adding a New Component

Create a new directory: {component}/
Add a Dockerfile
Add component files under {component}/kgap/
Update docker-compose.yml to include the service
Update Makefile DIMGS variable
Create documentation in docs/components/{component}.md

Contributing

See the main repository for contribution guidelines.

Advanced Topics

For advanced usage patterns and concepts, see:

Advanced Topics Guide
- Assertion paths and dereferencing patterns
- Custom SPARQL query templates
- Data validation patterns
- Performance optimization
- Multi-repository setup

Related Projects

py-sema: Python semantic processing library used by Sembench
ldes2sparql: LDES harvesting tool
GraphDB: RDF database
Jupyter: Interactive computing environment

License

K-GAP is licensed under the MIT License. See LICENSE for details.

Support

For issues and questions:

GitHub Issues: https://github.com/vliz-be-opsci/k-gap/issues
Organization: https://github.com/vliz-be-opsci

Publishing Documentation

This documentation is designed to be published on GitHub Pages. See GitHub Pages Setup Guide for instructions on publishing this documentation as a website.

FilesExpand file tree

index.md

Latest commit

History