Skip to content

Latest commit

 

History

History
356 lines (293 loc) · 10.4 KB

File metadata and controls

356 lines (293 loc) · 10.4 KB

Knowledge Graph (GraphRAG) Module

A modular component for Graph-based Retrieval Augmented Generation (GraphRAG) using Neo4j graph database and MinIO object storage.

Features

  • Graph-based Retrieval: Leverage graph structures for enhanced RAG with entity and relationship awareness
  • Neo4j Integration: Efficient connection pooling with configurable connection management
  • MinIO Storage: Store graph embeddings, community reports, and intermediate results in object storage
  • Complete CRUD Operations: Create, Read, Update, Delete for nodes and relationships
  • Advanced Search Methods:
    • Community/Global Search
    • Integrated Hybrid Search (Global + Parallel Retrieval + RRF + Expansion + Reranking)
  • Clustering Algorithms: Louvain, Leiden, Hierarchical clustering for community detection
  • RESTful API: FastAPI endpoints with OpenAPI/Swagger documentation
  • Modular Architecture: Can be enabled/disabled via configuration
  • Type-safe: Pydantic models for automatic validation
  • Error Handling: Comprehensive error handling with appropriate HTTP status codes

Module Activation

The Knowledge Graph module is part of Tiledesk LLM's modular architecture. To enable it:

1. Configuration

Edit service_conf.yaml:

services:
  graphrag: true  # Enable Knowledge Graph module

# Required dependencies configuration
minio:
  endpoint: "localhost:9000"
  access_key: "minioadmin"
  secret_key: "minioadmin"
  secure: false

neo4j:
  uri: "neo4j://localhost:7687"
  user: "neo4j"
  password: "password"
  database: "neo4j"

2. Install Optional Dependencies

# Install with Poetry extras
poetry install --extras "graph"

# Or install all modules
poetry install --extras "all"

3. Docker Deployment

Use the GraphRAG Docker profile:

docker-compose --profile app-graph up --build

Dependencies

Required Services

  • Neo4j: Graph database (version 5.x)
  • MinIO: Object storage for embeddings and reports
  • Redis: For caching and job queues (shared with main application)

Python Dependencies

  • neo4j: Neo4j Python driver
  • minio: MinIO Python SDK
  • langchain-aws: AWS integrations (for MinIO)
  • igraph: Graph analysis library
  • pandas: Data processing for community reports

Configuration

Environment Variables

Variable Description Default
NEO4J_URI Neo4j connection URI neo4j://localhost:7687
NEO4J_USER Neo4j username neo4j
NEO4J_PASSWORD Neo4j password password
NEO4J_DATABASE Neo4j database name neo4j
MINIO_ENDPOINT MinIO endpoint localhost:9000
MINIO_ACCESS_KEY MinIO access key minioadmin
MINIO_SECRET_KEY MinIO secret key minioadmin
MINIO_SECURE Use HTTPS false

Service Configuration

Configuration is centralized in service_conf.yaml. See service_conf.yaml.template for complete options.

API Endpoints

Utility

  • GET /api/kg/health - Check Neo4j connection health
  • GET /api/kg/stats - Get database statistics (node count, relationship count, etc.)

Node Management

  • POST /api/kg/nodes - Create a new node
  • GET /api/kg/nodes/{node_id} - Read node by ID
  • GET /api/kg/nodes?label=... - List nodes by label
  • GET /api/kg/nodes/search?label=...&property_key=...&property_value=... - Search nodes by property
  • PUT /api/kg/nodes/{node_id} - Update node
  • PATCH /api/kg/nodes/{node_id} - Partially update node
  • DELETE /api/kg/nodes/{node_id} - Delete node

Relationship Management

  • POST /api/kg/relationships - Create relationship between nodes
  • GET /api/kg/relationships/{relationship_id} - Read relationship by ID
  • GET /api/kg/nodes/{node_id}/relationships?direction=... - List relationships for a node
  • PUT /api/kg/relationships/{relationship_id} - Update relationship
  • PATCH /api/kg/relationships/{relationship_id} - Partially update relationship
  • DELETE /api/kg/relationships/{relationship_id} - Delete relationship

Graph Operations

  • POST /api/kg/create - Create/import knowledge graph from vector store namespace
  • POST /api/kg/add-document - Add a single document to existing knowledge graph and update community reports
  • POST /api/kg/louvein-cluster - Perform Louvain clustering with MinIO storage
  • POST /api/kg/leiden-cluster - Perform Leiden clustering
  • POST /api/kg/hierarchical - Perform Hierarchical Clustering

Search & QA

  • POST /api/kg/hybrid - Primary endpoint: Integrated hybrid search (Global + Parallel Retrieval + RRF + Expansion + Reranking)
  • POST /api/kg/qa - Community/Global search on community reports

Usage Examples

1. Health Check

curl http://localhost:8000/api/kg/health

2. Create a Node

curl -X POST http://localhost:8000/api/kg/nodes \
  -H "Content-Type: application/json" \
  -d '{
    "label": "Document",
    "properties": {
      "title": "Introduction to RAG",
      "content": "RAG stands for Retrieval Augmented Generation...",
      "embedding": [0.1, 0.2, 0.3]
    }
  }'

3. Create a Relationship

curl -X POST http://localhost:8000/api/kg/relationships \
  -H "Content-Type: application/json" \
  -d '{
    "source_id": "123",
    "target_id": "456",
    "type": "REFERENCES",
    "properties": {
      "weight": 0.8,
      "context": "citation"
    }
  }'

4. Search Nodes

curl "http://localhost:8000/api/kg/nodes?label=Document&limit=10"

5. Create Graph from Vector Store

curl -X POST http://localhost:8000/api/kg/create \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "my-documents",
    "engine": {
      "name": "pinecone",
      "type": "serverless",
      "apikey": "your-api-key",
      "vector_size": 1536,
      "index_name": "tilellm"
    }
  }'

6. Add Document to Graph (Incremental Update)

curl -X POST http://localhost:8000/api/kg/add-document \
  -H "Content-Type: application/json" \
  -d '{
    "metadata_id": "doc_12345_uuid",
    "namespace": "my-documents",
    "engine": {
      "name": "pinecone",
      "type": "serverless",
      "apikey": "your-api-key",
      "index_name": "tilellm"
    },
    "deduplicate_entities": true,
    "sparse_encoder": "splade",
    "llm_key": "my-llm-key",
    "model": "gpt-4"
  }'

7. Hybrid Search

curl -X POST http://localhost:8000/api/kg/hybrid \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the relationship between AI and machine learning?",
    "namespace": "my-documents",
    "engine": {
      "name": "pinecone",
      "type": "serverless",
      "apikey": "your-api-key",
      "vector_size": 1536,
      "index_name": "tilellm"
    }
  }'

Data Models

Node

{
  "id": "string",           # Auto-generated by Neo4j
  "label": "string",        # Node type (e.g., Document, Person)
  "properties": {           # Custom properties
    "key": "value"
  }
}

Relationship

{
  "id": "string",           # Auto-generated by Neo4j
  "source_id": "string",    # Source node ID
  "target_id": "string",    # Target node ID
  "type": "string",         # Relationship type (e.g., REFERENCES)
  "properties": {           # Custom properties
    "key": "value"
  }
}

Neo4j Conventions

  • Node Labels: Use PascalCase (e.g., Document, Person, Organization)
  • Relationship Types: Use UPPER_SNAKE_CASE (e.g., RELATES_TO, REFERENCES, CITES)
  • Properties: Use snake_case (e.g., created_at, document_id, embedding_vector)

Connection Pooling

The module uses a connection pool with the following settings:

  • Max pool size: 50 connections (configurable)
  • Acquisition timeout: 60 seconds
  • Connection lifetime: 1 hour

The pool is initialized once and reused for all requests, ensuring optimal performance.

MinIO Storage Structure

GraphRAG uses MinIO for storing:

  • Community reports (Parquet format): community-reports/
  • Graph embeddings: embeddings/
  • Intermediate processing results: intermediate/

Bucket naming follows the pattern: graphrag-{namespace}.

Error Handling

The module handles the following errors:

  • 400 Bad Request: Validation failed, invalid data
  • 404 Not Found: Resource not found
  • 500 Internal Server Error: Database or internal errors
  • 503 Service Unavailable: Neo4j or MinIO unavailable

Testing

Access interactive API documentation:

http://localhost:8000/docs

All endpoints are documented with interactive examples.

Best Practices for GraphRAG

1. Store Documents with Embeddings

{
  "label": "Document",
  "properties": {
    "content": "...",
    "embedding": [...],  # Vector embedding
    "metadata": {...}
  }
}

2. Create Semantic Relationships

{
  "source_id": "doc1",
  "target_id": "doc2",
  "type": "SIMILAR_TO",
  "properties": {
    "similarity_score": 0.85
  }
}

3. Model Knowledge Hierarchy

# Document -> Sections -> Paragraphs
doc = create_node(label="Document", properties={...})
section = create_node(label="Section", properties={...})
create_relationship(doc.id, section.id, "CONTAINS")

4. Community Detection

Use clustering endpoints (/api/kg/hierarchical) to automatically detect and organize related content into communities.

Integration with Main Application

The Knowledge Graph module integrates seamlessly with Tiledesk LLM:

  • Authentication: Uses the same JWT token system
  • Vector Stores: Compatible with Pinecone and Qdrant
  • Configuration: Centralized via service_conf.yaml
  • Docker: Available via app-graph profile

Resources

Technical Documentation

For detailed technical documentation on how the Knowledge Graph module works, including:

  • Creation process (/api/kg/create)
  • Global Search (/api/kg/qa)
  • Integrated Hybrid Search (/api/kg/hybrid)
  • Role of LLMs, embeddings, reranking, and adaptive graph expansion

See the following reports:

Support

For issues and questions, refer to the main project repository: https://github.com/Tiledesk/tiledesk-llm


Module Status: Active
Last Updated: December 2025