| title | Configuration Guide |
|---|---|
| nav_order | 3 |
Complete reference for configuring all K-GAP components through environment variables and configuration files.
- Environment Variables Overview
- GraphDB Configuration
- Jupyter Configuration
- Sembench Configuration
- LDES Consumer Configuration
- Complete
.envExample
K-GAP uses environment variables (stored in .env file) to configure all components. Copy the template and customize:
cp dotenv-example .env- COMPOSE_: Docker Compose settings
- GDB_: GraphDB settings
- JUPYTER_: Jupyter settings (optional)
- SEMBENCH_: Sembench settings
- LDES_: LDES Consumer settings
- LOG_: Logging settings
| Variable | Default | Purpose | Example |
|---|---|---|---|
GDB_REPO |
kgap |
Repository identifier | kgap,my-repo |
REPOLABEL |
label_repo_here |
Human-readable name | My Knowledge Graph |
GDB_HOME_FOLDER |
/opt/graphdb/home |
Data directory | /data/graphdb |
GDB_MAX_HEADER |
65536 |
Max HTTP header size | 65536 (dev), 131072 (prod) |
GDB_JAVA_OPTS |
-Xms8g -Xmx16g ... |
Java runtime options | See below |
GraphDB Java options control memory allocation. Adjust GDB_JAVA_OPTS based on your system:
# Development (4GB)
GDB_JAVA_OPTS="-Xms2g -Xmx4g"
# Standard (16GB)
GDB_JAVA_OPTS="-Xms8g -Xmx16g -Dcom.ontotext.graphdb.monitoring.jmx=true"
# Large-scale (64GB with monitoring)
GDB_JAVA_OPTS="-Xms32g -Xmx64g -Dcom.ontotext.graphdb.monitoring.jmx=true -XX:+UseG1GC"
# Kubernetes/Container (automatic tuning)
GDB_JAVA_OPTS="-Xms4g -Xmx8g -XX:+PerfDisableSharedMem"To persist GraphDB data across container restarts:
# 1. Create directory
mkdir -p ./data/graphdb
# 2. Add to .env
GDB_HOME_FOLDER=/data/graphdb
# 3. Verify docker-compose.yml has volume:
# volumes:
# - ./data/graphdb:/opt/graphdb/home| Variable | Default | Purpose | Example |
|---|---|---|---|
GDB_BASE |
http://graphdb:7200/ |
GraphDB service URL | See below |
GDB_REPO |
kgap |
GraphDB repository name | Same as GDB_REPO above |
NOTEBOOK_ARGS |
--NotebookApp.token='' |
Jupyter settings | Typically unchanged |
SRC_FOLDER |
/kgap/notebooks |
Notebook location | Typically unchanged |
For external GraphDB (not in Docker Compose):
# Within Docker Compose network
GDB_BASE=http://graphdb-server:7200/
# Remote server (DNS)
GDB_BASE=http://graphdb.example.org:7200/
# Remote server (IP)
GDB_BASE=http://192.168.1.100:7200/Permanently add dependencies:
# 1. Edit requirements file
echo "rdflib==6.1" >> jupyter/kgap/requirements.txt
echo "networkx==3.0" >> jupyter/kgap/requirements.txt
# 2. Rebuild and restart
docker compose build jupyter
docker compose up -d jupyterAlternatively, install in notebook:
!pip install rdflib networkx| Variable | Default | Purpose |
|---|---|---|
SEMBENCH_INPUT_PATH |
/data |
Input data directory |
SEMBENCH_OUTPUT_PATH |
/data |
Output data directory |
SEMBENCH_HOME_PATH |
/data |
Runtime files directory |
SEMBENCH_CONFIG_PATH |
/data/sembench.yaml |
Configuration file path |
SCHEDULER_INTERVAL_SECONDS |
86400 |
Task check interval (seconds) |
LOG_LEVEL |
INFO |
Logging level |
The SCHEDULER_INTERVAL_SECONDS controls how often Sembench checks for scheduled tasks:
# Check every hour (useful for hourly/daily tasks)
SCHEDULER_INTERVAL_SECONDS=3600
# Check every 10 minutes (for frequently scheduled tasks)
SCHEDULER_INTERVAL_SECONDS=600
# Check once per day at startup (set and forget)
SCHEDULER_INTERVAL_SECONDS=86400
# Check every 30 seconds (aggressive, for testing)
SCHEDULER_INTERVAL_SECONDS=30If not using Sembench yet:
# Create empty config
echo "workflows: []" > ./data/sembench.yaml| Variable | Default | Purpose |
|---|---|---|
LDES_CONFIG_FILE |
/data/ldes-feeds.yaml |
Feed configuration path |
LDES2SPARQL_IMAGE |
ghcr.io/maregraph-eu/ldes2sparql:latest |
Container image |
LDES_LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
LOG_LEVEL |
INFO |
General logging level |
COMPOSE_PROJECT_NAME |
kgap |
Docker Compose project name |
DEFAULT_SPARQL_ENDPOINT |
http://graphdb:7200/... |
Default ingest endpoint |
REMOVE_ORPHANS |
false |
Remove unlisted feeds |
Create data/ldes-feeds.yaml with feed definitions:
feeds:
feed-name-1:
url: https://example.org/ldes/feed1
environment:
POLLING_FREQUENCY: 300000 # 5 minutes in milliseconds
MATERIALIZE: "false"
RESTART: "unless-stopped"
feed-name-2:
url: https://example.org/ldes/feed2
sparql_endpoint: http://graphdb:7200/repositories/custom/statements
environment:
POLLING_FREQUENCY: 600000 # 10 minutes
MATERIALIZE: "true"Common polling frequencies (in milliseconds):
| Frequency | Milliseconds | Use Case |
|---|---|---|
| Every 1 minute | 60000 |
High-frequency data feeds |
| Every 5 minutes | 300000 |
Active data feeds |
| Every 10 minutes | 600000 |
Standard feeds (recommended) |
| Every 30 minutes | 1800000 |
Lower priority feeds |
| Every hour | 3600000 |
Bulk data feeds |
| Every 6 hours | 21600000 |
Slow-changing data |
| Every 24 hours | 86400000 |
Daily snapshot data |
Control feed container restart behavior:
# Restart unless explicitly stopped (production default)
RESTART: "unless-stopped"
# Always restart on exit (high availability)
RESTART: "always"
# Restart only on non-zero exit (with backoff)
RESTART: "on-failure"
# Don't restart (experimental/testing)
RESTART: "no"# ============================================================================
# Docker Compose Configuration
# ============================================================================
COMPOSE_PROJECT_NAME=kgap
BUILD_TAG=latest
LOG_LEVEL=INFO
# ============================================================================
# GraphDB Configuration
# ============================================================================
# Repository identity
GDB_REPO=kgap
REPOLABEL=K-GAP Knowledge Graph Repository
# Storage (optional - comment out to keep data in container)
# GDB_HOME_FOLDER=/data/graphdb
# Performance tuning
GDB_MAX_HEADER=65536
GDB_JAVA_OPTS="-Xms8g -Xmx16g -Dcom.ontotext.graphdb.monitoring.jmx=true"
# ============================================================================
# Jupyter Configuration
# ============================================================================
# Connect to GraphDB
GDB_BASE=http://graphdb:7200/
# Notebook startup options (token authentication disabled)
NOTEBOOK_ARGS="--NotebookApp.token=''"
SRC_FOLDER=/kgap/notebooks
# ============================================================================
# Sembench Configuration
# ============================================================================
SEMBENCH_INPUT_PATH=/data
SEMBENCH_OUTPUT_PATH=/data
SEMBENCH_HOME_PATH=/data
SEMBENCH_CONFIG_PATH=/data/sembench.yaml
# How often to check for scheduled tasks (seconds)
# 86400 = once per day, 3600 = once per hour
SCHEDULER_INTERVAL_SECONDS=86400
# ============================================================================
# LDES Consumer Configuration
# ============================================================================
# Feed configuration file
LDES_CONFIG_FILE=/data/ldes-feeds.yaml
# (Optional) Custom ldes2sparql image
# LDES2SPARQL_IMAGE=ghcr.io/maregraph-eu/ldes2sparql:latest
# Logging for LDES containers
LDES_LOG_LEVEL=INFO
# Docker network (auto-detected, usually no change needed)
# DOCKER_NETWORK=kgap_default
# Whether to remove containers not in configuration
# REMOVE_ORPHANS=falseMinimal resources, debug logging:
COMPOSE_PROJECT_NAME=kgap-dev
LOG_LEVEL=DEBUG
GDB_JAVA_OPTS="-Xms2g -Xmx4g"
LDES_LOG_LEVEL=DEBUG
SCHEDULER_INTERVAL_SECONDS=3600Medium resources, INFO logging:
COMPOSE_PROJECT_NAME=kgap-test
LOG_LEVEL=INFO
GDB_JAVA_OPTS="-Xms4g -Xmx8g"
GDB_HOME_FOLDER=/data/graphdb
SCHEDULER_INTERVAL_SECONDS=1800 # Check every 30 minFull resources, WARNING logging, persistence:
COMPOSE_PROJECT_NAME=kgap-prod
LOG_LEVEL=WARNING
GDB_JAVA_OPTS="-Xms32g -Xmx64g -Dcom.ontotext.graphdb.monitoring.jmx=true -XX:+UseG1GC"
GDB_HOME_FOLDER=/data/graphdb
LDES_LOG_LEVEL=INFO
SCHEDULER_INTERVAL_SECONDS=86400
LDES_CLEAR_STATE=0After creating .env, verify configuration:
# 1. Check .env exists and is readable
test -f .env && echo "✓ .env exists"
# 2. Verify required variables
grep "GDB_REPO" .env && echo "✓ GDB_REPO set"
grep "LDES_CONFIG_FILE" .env && echo "✓ LDES_CONFIG_FILE set"
# 3. Check configuration files exist
test -f ./data/ldes-feeds.yaml && echo "✓ ldes-feeds.yaml exists"
test -f ./data/sembench.yaml && echo "✓ sembench.yaml exists"
# 4. Validate YAML syntax
python -m yaml ./data/ldes-feeds.yaml && echo "✓ ldes-feeds.yaml valid"
python -m yaml ./data/sembench.yaml && echo "✓ sembench.yaml valid"
# 5. Start services and check logs
docker compose up -d
sleep 5
docker compose logs --grep "ERROR" && echo "⚠ Check errors above"After modifying .env:
# Option 1: Restart affected service
docker compose restart graphdb
docker compose restart jupyter
docker compose restart sembench
# Option 2: Restart all services
docker compose down
docker compose up -d
# Option 3: Rebuild and restart (for image changes)
docker compose build
docker compose up -d# Restore original template
cp dotenv-example .env
# Or selectively
grep "^GDB_REPO" dotenv-example >> .env# 1. Verify .env is being read
docker compose config | grep "GDB_REPO"
# 2. Check container environment
docker compose exec graphdb env | grep GDB
# 3. Restart service
docker compose down
docker compose up -d# Check LDES configuration
python -c "import yaml; yaml.safe_load(open('./data/ldes-feeds.yaml'))"
# Check Sembench configuration
python -c "import yaml; yaml.safe_load(open('./data/sembench.yaml'))"# Check available system memory
free -h
# Review actual Java allocation
docker compose exec graphdb jps -e | grep graphdb
# Adjust GDB_JAVA_OPTS in .env and restart