| title | Quick Reference |
|---|---|
| nav_order | 4 |
Quick reference guide for common K-GAP operations.
# Clone and start
git clone https://github.com/vliz-be-opsci/k-gap.git
cd k-gap
cp dotenv-example .env
mkdir -p ./data ./notebooks
docker compose up -d
# Access
# GraphDB: http://localhost:7200
# Jupyter: http://localhost:8889
# YASGUI: http://localhost:8080# Start all services
docker compose up -d
# Stop all services
docker compose down
# Restart a specific service
docker compose restart graphdb
docker compose restart jupyter
docker compose restart sembench
docker compose restart ldes-consumer
# View running containers
docker compose ps# Follow all logs
docker compose logs -f
# Specific service logs
docker compose logs -f graphdb
docker compose logs -f jupyter
# LDES feed container logs
docker logs ldes-consumer-{feed-name}# Rebuild all images
make docker-build
# Rebuild specific service
docker compose build graphdb
docker compose build jupyter
# Rebuild and restart
docker compose up -d --build# Copy and start
cp dotenv-example .env
echo "LOG_LEVEL=DEBUG" >> .env
echo "GDB_JAVA_OPTS=\"-Xms2g -Xmx4g\"" >> .envCOMPOSE_PROJECT_NAME=kgap
LOG_LEVEL=INFO
GDB_REPO=kgap
REPOLABEL=K-GAP Production
GDB_HOME_FOLDER=/data/graphdb
GDB_MAX_HEADER=65536
GDB_JAVA_OPTS="-Xms8g -Xmx16g -Dcom.ontotext.graphdb.monitoring.jmx=true"
SEMBENCH_CONFIG_PATH=/data/sembench.yaml
SCHEDULER_INTERVAL_SECONDS=86400
LDES_CONFIG_FILE=/data/ldes-feeds.yaml
LDES_LOG_LEVEL=INFO# For large knowledge graphs (64GB+ systems)
GDB_HOME_FOLDER=/data/graphdb
GDB_JAVA_OPTS="-Xms32g -Xmx64g -Dcom.ontotext.graphdb.monitoring.jmx=true -XX:+UseG1GC"
LOG_LEVEL=WARNING
SCHEDULER_INTERVAL_SECONDS=86400For complete documentation, see Configuration Guide.
| Component | Variable | Default | Common Values |
|---|---|---|---|
| Compose | COMPOSE_PROJECT_NAME |
kgap |
kgap, kgap-prod, kgap-dev |
| Compose | BUILD_TAG |
latest |
latest, v1.0.0, main |
| Logging | LOG_LEVEL |
INFO |
DEBUG, INFO, WARNING, ERROR |
| GraphDB | GDB_REPO |
kgap |
Any alphanumeric string |
| GraphDB | REPOLABEL |
(empty) | Description of repository |
| GraphDB | GDB_HOME_FOLDER |
/opt/graphdb/home |
/data/graphdb (for persistence) |
| GraphDB | GDB_MAX_HEADER |
65536 |
65536 (dev), 131072 (prod) |
| GraphDB | GDB_JAVA_OPTS |
-Xms8g... |
See Configuration Guide |
| Jupyter | GDB_BASE |
http://graphdb:7200/ |
http://hostname:7200/ |
| Jupyter | NOTEBOOK_ARGS |
--NotebookApp.token='' |
Usually unchanged |
| Sembench | SEMBENCH_CONFIG_PATH |
/data/sembench.yaml |
Path to config file |
| Sembench | SCHEMA_INTERVAL_SECONDS |
86400 |
86400 (daily), 3600 (hourly) |
| LDES | LDES_CONFIG_FILE |
/data/ldes-feeds.yaml |
Path to config file |
| LDES | LDES_LOG_LEVEL |
INFO |
DEBUG, INFO, WARNING |
| LDES | LDES2SPARQL_IMAGE |
ghcr.io/... |
Usually unchanged |
feeds:
# Minimal feed
my-feed:
url: https://example.com/ldes
# Full feed example
advanced-feed:
url: https://example.org/ldes/data
sparql_endpoint: http://graphdb:7200/repositories/kgap/statements
target_graph: urn:kgap:my-feed
environment:
POLLING_FREQUENCY: 300000 # Every 5 minutes (milliseconds)
MATERIALIZE: "false"
RESTART: "unless-stopped"
MEMBER_BATCH_SIZE: "5000"Polling Frequencies:
60000= 1 minute (real-time feeds)300000= 5 minutes (active feeds)600000= 10 minutes (default)3600000= 1 hour (bulk data)
Quickly set up common deployments:
# Development
cp dotenv-example .env
echo "GDB_JAVA_OPTS=\"-Xms2g -Xmx4g\"" >> .env
echo "LOG_LEVEL=DEBUG" >> .env
echo "SCHEDULER_INTERVAL_SECONDS=3600" >> .env
# Production (persistence + monitoring)
cp dotenv-example .env
echo "GDB_HOME_FOLDER=/data/graphdb" >> .env
echo "GDB_JAVA_OPTS=\"-Xms16g -Xmx32g -Dcom.ontotext.graphdb.monitoring.jmx=true\"" >> .env
echo "LOG_LEVEL=WARNING" >> .env# Count all triples
SELECT (COUNT(*) as ?count)
WHERE { ?s ?p ?o }
# List all types
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?type (COUNT(?s) as ?count)
WHERE { ?s rdf:type ?type }
GROUP BY ?type
ORDER BY DESC(?count)
# List all predicates
SELECT DISTINCT ?p (COUNT(*) as ?count)
WHERE { ?s ?p ?o }
GROUP BY ?p
ORDER BY DESC(?count)# Get entities with labels
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity ?label
WHERE {
?entity rdfs:label ?label .
}
LIMIT 100
# Full-text search
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?entity ?score
WHERE {
?entity luc:searchIndex "marine" ;
luc:score ?score .
}
ORDER BY DESC(?score)# Insert data
PREFIX ex: <http://example.org/>
INSERT DATA {
ex:entity1 ex:property "value" .
}
# Delete data
PREFIX ex: <http://example.org/>
DELETE DATA {
ex:entity1 ex:property "value" .
}
# Update (delete + insert)
PREFIX ex: <http://example.org/>
DELETE { ?s ex:oldProp ?o }
INSERT { ?s ex:newProp ?o }
WHERE { ?s ex:oldProp ?o }from kgap_tools import execute_to_df, GDB
# Using templates
df = execute_to_df('my_query', param1='value')
# Direct SPARQL
sparql = "SELECT * WHERE { ?s ?p ?o } LIMIT 10"
result = GDB.query(sparql=sparql)
df = result.to_dataframe()import pandas as pd
# Read data
df = pd.read_csv('/data/input.csv')
# Process and query
for idx, row in df.iterrows():
# Query GraphDB based on row data
results = execute_to_df('template', value=row['column'])
# Process results
# Write results
df.to_csv('/data/output.csv', index=False)# 1. Edit config
nano data/ldes-feeds.yaml
# 2. Add feed entry
# feeds:
# - name: new-feed
# url: https://example.com/ldes
# sparql_endpoint: http://graphdb:7200/repositories/kgap/statements
# polling_interval: 300
# 3. Restart consumer
docker compose restart ldes-consumer# Export all data to TTL
curl 'http://localhost:7200/repositories/kgap/statements' \
-H 'Accept: text/turtle' \
> export.ttl
# Export specific graph
curl 'http://localhost:7200/repositories/kgap/statements?context=%3Chttp://example.org/graph%3E' \
-H 'Accept: text/turtle' \
> graph-export.ttl# Import TTL file
curl -X POST \
http://localhost:7200/repositories/kgap/statements \
-H 'Content-Type: text/turtle' \
--data-binary '@import.ttl'
# Import to named graph
curl -X POST \
'http://localhost:7200/repositories/kgap/statements?context=%3Chttp://example.org/graph%3E' \
-H 'Content-Type: text/turtle' \
--data-binary '@import.ttl'# Clear all data
curl -X DELETE http://localhost:7200/repositories/kgap/statements
# Clear specific graph
curl -X DELETE 'http://localhost:7200/repositories/kgap/statements?context=%3Chttp://example.org/graph%3E'# Check logs
docker compose logs graphdb
# Common fixes
# 1. Increase memory in .env:
# GDB_JAVA_OPTS="-Xms16g -Xmx32g"
# 2. Check port 7200 not in use:
# lsof -i :7200
# 3. Remove and recreate:
# docker compose down
# docker volume prune
# docker compose up -d# Test connection
import os
from pykg2tbl import KGSource
endpoint = f"{os.getenv('GDB_BASE')}repositories/{os.getenv('GDB_REPO')}"
print(f"Testing: {endpoint}")
try:
kg = KGSource.build(endpoint)
result = kg.query("ASK { ?s ?p ?o }")
print("✓ Connection successful")
except Exception as e:
print(f"✗ Connection failed: {e}")# Check feed container
docker ps | grep ldes-consumer
docker logs ldes-consumer-{feed-name}
# Test feed URL
curl -I {feed-url}
# Check GraphDB endpoint
curl http://localhost:7200/repositories/kgap/statements
# Restart feed
docker stop ldes-consumer-{feed-name}
docker rm ldes-consumer-{feed-name}
docker compose restart ldes-consumer# Increase limits in docker-compose.yml
services:
graphdb:
environment:
GDB_JAVA_OPTS: "-Xms16g -Xmx32g"
deploy:
resources:
limits:
memory: 40G# Repository info
curl http://localhost:7200/rest/repositories/kgap
# Repository size
curl http://localhost:7200/rest/repositories/kgap/size
# Namespaces
curl http://localhost:7200/repositories/kgap/namespaces
# Contexts (graphs)
curl http://localhost:7200/repositories/kgap/contexts# GraphDB health
curl http://localhost:7200/
# Jupyter health
curl http://localhost:8889/
# Check all services
docker compose psk-gap/
├── data/ # Shared data volume
│ ├── ldes-feeds.yaml # LDES configuration
│ ├── sembench.yaml # Sembench configuration
│ └── *.ttl, *.csv, etc. # Data files
├── notebooks/ # Jupyter notebooks
│ └── queries/ # SPARQL query templates
├── .env # Environment configuration
└── docker-compose.yml # Service definitions
| Service | Port | URL |
|---|---|---|
| GraphDB | 7200 | http://localhost:7200 |
| Jupyter | 8889 | http://localhost:8889 |
| YASGUI | 8080 | http://localhost:8080 |
| Service | CPU | Memory |
|---|---|---|
| GraphDB | 4 cores | 8-16GB (configurable) |
| Jupyter | unlimited | unlimited |
| Sembench | unlimited | unlimited |
| LDES Consumer | unlimited | unlimited |