Skip to content

Latest commit

 

History

History
479 lines (372 loc) · 10.5 KB

File metadata and controls

479 lines (372 loc) · 10.5 KB
title Quick Reference
nav_order 4

K-GAP Quick Reference

Quick reference guide for common K-GAP operations.

Quick Start

# Clone and start
git clone https://github.com/vliz-be-opsci/k-gap.git
cd k-gap
cp dotenv-example .env
mkdir -p ./data ./notebooks
docker compose up -d

# Access
# GraphDB: http://localhost:7200
# Jupyter: http://localhost:8889
# YASGUI:  http://localhost:8080

Docker Commands

Start/Stop Services

# Start all services
docker compose up -d

# Stop all services
docker compose down

# Restart a specific service
docker compose restart graphdb
docker compose restart jupyter
docker compose restart sembench
docker compose restart ldes-consumer

# View running containers
docker compose ps

Logs

# Follow all logs
docker compose logs -f

# Specific service logs
docker compose logs -f graphdb
docker compose logs -f jupyter

# LDES feed container logs
docker logs ldes-consumer-{feed-name}

Rebuilding

# Rebuild all images
make docker-build

# Rebuild specific service
docker compose build graphdb
docker compose build jupyter

# Rebuild and restart
docker compose up -d --build

Configuration Quick Reference

Environment Variables (.env)

Minimal Development Setup

# Copy and start
cp dotenv-example .env
echo "LOG_LEVEL=DEBUG" >> .env
echo "GDB_JAVA_OPTS=\"-Xms2g -Xmx4g\"" >> .env

Standard Production Setup

COMPOSE_PROJECT_NAME=kgap
LOG_LEVEL=INFO
GDB_REPO=kgap
REPOLABEL=K-GAP Production
GDB_HOME_FOLDER=/data/graphdb
GDB_MAX_HEADER=65536
GDB_JAVA_OPTS="-Xms8g -Xmx16g -Dcom.ontotext.graphdb.monitoring.jmx=true"
SEMBENCH_CONFIG_PATH=/data/sembench.yaml
SCHEDULER_INTERVAL_SECONDS=86400
LDES_CONFIG_FILE=/data/ldes-feeds.yaml
LDES_LOG_LEVEL=INFO

High-Performance Setup

# For large knowledge graphs (64GB+ systems)
GDB_HOME_FOLDER=/data/graphdb
GDB_JAVA_OPTS="-Xms32g -Xmx64g -Dcom.ontotext.graphdb.monitoring.jmx=true -XX:+UseG1GC"
LOG_LEVEL=WARNING
SCHEDULER_INTERVAL_SECONDS=86400

Complete Environment Reference

For complete documentation, see Configuration Guide.

Component Variable Default Common Values
Compose COMPOSE_PROJECT_NAME kgap kgap, kgap-prod, kgap-dev
Compose BUILD_TAG latest latest, v1.0.0, main
Logging LOG_LEVEL INFO DEBUG, INFO, WARNING, ERROR
GraphDB GDB_REPO kgap Any alphanumeric string
GraphDB REPOLABEL (empty) Description of repository
GraphDB GDB_HOME_FOLDER /opt/graphdb/home /data/graphdb (for persistence)
GraphDB GDB_MAX_HEADER 65536 65536 (dev), 131072 (prod)
GraphDB GDB_JAVA_OPTS -Xms8g... See Configuration Guide
Jupyter GDB_BASE http://graphdb:7200/ http://hostname:7200/
Jupyter NOTEBOOK_ARGS --NotebookApp.token='' Usually unchanged
Sembench SEMBENCH_CONFIG_PATH /data/sembench.yaml Path to config file
Sembench SCHEMA_INTERVAL_SECONDS 86400 86400 (daily), 3600 (hourly)
LDES LDES_CONFIG_FILE /data/ldes-feeds.yaml Path to config file
LDES LDES_LOG_LEVEL INFO DEBUG, INFO, WARNING
LDES LDES2SPARQL_IMAGE ghcr.io/... Usually unchanged

LDES Feeds (data/ldes-feeds.yaml)

feeds:
  # Minimal feed
  my-feed:
    url: https://example.com/ldes

  # Full feed example
  advanced-feed:
    url: https://example.org/ldes/data
    sparql_endpoint: http://graphdb:7200/repositories/kgap/statements
    target_graph: urn:kgap:my-feed
    environment:
      POLLING_FREQUENCY: 300000      # Every 5 minutes (milliseconds)
      MATERIALIZE: "false"
      RESTART: "unless-stopped"
      MEMBER_BATCH_SIZE: "5000"

Polling Frequencies:

  • 60000 = 1 minute (real-time feeds)
  • 300000 = 5 minutes (active feeds)
  • 600000 = 10 minutes (default)
  • 3600000 = 1 hour (bulk data)

Setup Prebuild Profiles

Quickly set up common deployments:

# Development
cp dotenv-example .env
echo "GDB_JAVA_OPTS=\"-Xms2g -Xmx4g\"" >> .env
echo "LOG_LEVEL=DEBUG" >> .env
echo "SCHEDULER_INTERVAL_SECONDS=3600" >> .env

# Production (persistence + monitoring)
cp dotenv-example .env
echo "GDB_HOME_FOLDER=/data/graphdb" >> .env
echo "GDB_JAVA_OPTS=\"-Xms16g -Xmx32g -Dcom.ontotext.graphdb.monitoring.jmx=true\"" >> .env
echo "LOG_LEVEL=WARNING" >> .env

SPARQL Queries

Basic Queries

# Count all triples
SELECT (COUNT(*) as ?count)
WHERE { ?s ?p ?o }

# List all types
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?type (COUNT(?s) as ?count)
WHERE { ?s rdf:type ?type }
GROUP BY ?type
ORDER BY DESC(?count)

# List all predicates
SELECT DISTINCT ?p (COUNT(*) as ?count)
WHERE { ?s ?p ?o }
GROUP BY ?p
ORDER BY DESC(?count)

Data Queries

# Get entities with labels
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity ?label
WHERE {
  ?entity rdfs:label ?label .
}
LIMIT 100

# Full-text search
PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?entity ?score
WHERE {
  ?entity luc:searchIndex "marine" ;
          luc:score ?score .
}
ORDER BY DESC(?score)

Updates

# Insert data
PREFIX ex: <http://example.org/>
INSERT DATA {
  ex:entity1 ex:property "value" .
}

# Delete data
PREFIX ex: <http://example.org/>
DELETE DATA {
  ex:entity1 ex:property "value" .
}

# Update (delete + insert)
PREFIX ex: <http://example.org/>
DELETE { ?s ex:oldProp ?o }
INSERT { ?s ex:newProp ?o }
WHERE { ?s ex:oldProp ?o }

Jupyter Notebook Commands

Query GraphDB

from kgap_tools import execute_to_df, GDB

# Using templates
df = execute_to_df('my_query', param1='value')

# Direct SPARQL
sparql = "SELECT * WHERE { ?s ?p ?o } LIMIT 10"
result = GDB.query(sparql=sparql)
df = result.to_dataframe()

Working with Data

import pandas as pd

# Read data
df = pd.read_csv('/data/input.csv')

# Process and query
for idx, row in df.iterrows():
    # Query GraphDB based on row data
    results = execute_to_df('template', value=row['column'])
    # Process results

# Write results
df.to_csv('/data/output.csv', index=False)

Common Patterns

Add LDES Feed

# 1. Edit config
nano data/ldes-feeds.yaml

# 2. Add feed entry
# feeds:
#   - name: new-feed
#     url: https://example.com/ldes
#     sparql_endpoint: http://graphdb:7200/repositories/kgap/statements
#     polling_interval: 300

# 3. Restart consumer
docker compose restart ldes-consumer

Export Data

# Export all data to TTL
curl 'http://localhost:7200/repositories/kgap/statements' \
  -H 'Accept: text/turtle' \
  > export.ttl

# Export specific graph
curl 'http://localhost:7200/repositories/kgap/statements?context=%3Chttp://example.org/graph%3E' \
  -H 'Accept: text/turtle' \
  > graph-export.ttl

Import Data

# Import TTL file
curl -X POST \
  http://localhost:7200/repositories/kgap/statements \
  -H 'Content-Type: text/turtle' \
  --data-binary '@import.ttl'

# Import to named graph
curl -X POST \
  'http://localhost:7200/repositories/kgap/statements?context=%3Chttp://example.org/graph%3E' \
  -H 'Content-Type: text/turtle' \
  --data-binary '@import.ttl'

Clear Repository

# Clear all data
curl -X DELETE http://localhost:7200/repositories/kgap/statements

# Clear specific graph
curl -X DELETE 'http://localhost:7200/repositories/kgap/statements?context=%3Chttp://example.org/graph%3E'

Troubleshooting

GraphDB Won't Start

# Check logs
docker compose logs graphdb

# Common fixes
# 1. Increase memory in .env:
#    GDB_JAVA_OPTS="-Xms16g -Xmx32g"
# 2. Check port 7200 not in use:
#    lsof -i :7200
# 3. Remove and recreate:
#    docker compose down
#    docker volume prune
#    docker compose up -d

Jupyter Can't Connect to GraphDB

# Test connection
import os
from pykg2tbl import KGSource

endpoint = f"{os.getenv('GDB_BASE')}repositories/{os.getenv('GDB_REPO')}"
print(f"Testing: {endpoint}")

try:
    kg = KGSource.build(endpoint)
    result = kg.query("ASK { ?s ?p ?o }")
    print("✓ Connection successful")
except Exception as e:
    print(f"✗ Connection failed: {e}")

LDES Feed Not Working

# Check feed container
docker ps | grep ldes-consumer
docker logs ldes-consumer-{feed-name}

# Test feed URL
curl -I {feed-url}

# Check GraphDB endpoint
curl http://localhost:7200/repositories/kgap/statements

# Restart feed
docker stop ldes-consumer-{feed-name}
docker rm ldes-consumer-{feed-name}
docker compose restart ldes-consumer

Out of Memory

# Increase limits in docker-compose.yml
services:
  graphdb:
    environment:
      GDB_JAVA_OPTS: "-Xms16g -Xmx32g"
    deploy:
      resources:
        limits:
          memory: 40G

Useful Endpoints

GraphDB REST API

# Repository info
curl http://localhost:7200/rest/repositories/kgap

# Repository size
curl http://localhost:7200/rest/repositories/kgap/size

# Namespaces
curl http://localhost:7200/repositories/kgap/namespaces

# Contexts (graphs)
curl http://localhost:7200/repositories/kgap/contexts

Health Checks

# GraphDB health
curl http://localhost:7200/

# Jupyter health
curl http://localhost:8889/

# Check all services
docker compose ps

File Locations

k-gap/
├── data/                    # Shared data volume
│   ├── ldes-feeds.yaml     # LDES configuration
│   ├── sembench.yaml       # Sembench configuration
│   └── *.ttl, *.csv, etc.  # Data files
├── notebooks/              # Jupyter notebooks
│   └── queries/            # SPARQL query templates
├── .env                    # Environment configuration
└── docker-compose.yml      # Service definitions

Port Reference

Service Port URL
GraphDB 7200 http://localhost:7200
Jupyter 8889 http://localhost:8889
YASGUI 8080 http://localhost:8080

Resource Allocation Defaults

Service CPU Memory
GraphDB 4 cores 8-16GB (configurable)
Jupyter unlimited unlimited
Sembench unlimited unlimited
LDES Consumer unlimited unlimited

Links