Skip to content

Latest commit

 

History

History
273 lines (209 loc) · 7.95 KB

File metadata and controls

273 lines (209 loc) · 7.95 KB

Knowledge Graph Protection - Phase 1 Implementation Status

Date: January 7, 2026
Status: In Progress
Commit: c11cc2e


Phase 1 Objectives

Implement configuration-only enhancements for knowledge graph protection without requiring new features. Timeline: 2 weeks.

Tasks

1. Enhanced Audit Events ✅ COMPLETE

Status: ✅ Implemented and committed (c11cc2e)

Changes Made:

  • Added 7 new SecurityEventType entries in include/utils/audit_logger.h:

    GRAPH_TRAVERSAL,        // BFS/DFS traversal operations
    BULK_NODE_ACCESS,       // Large-scale node queries
    BULK_EDGE_ACCESS,       // Large-scale edge queries
    EMBEDDING_QUERY,        // Vector embedding queries
    EMBEDDING_EXPORT,       // Vector embedding downloads
    GRAPH_EXPORT,           // Full graph exports
    TEMPORAL_QUERY,         // Historical graph queries
  • Updated src/utils/audit_logger.cpp to handle new event types in securityEventTypeToString()

Testing:

  • Existing audit logger tests still pass
  • New events can be logged via logSecurityEvent() API

2. Prometheus Monitoring Alerts ✅ COMPLETE

Status: ✅ Implemented and committed (c11cc2e)

Files Created:

  • grafana/alerts/graph_security.yaml (11.3 KB)
    • 14 comprehensive alert rules
    • 4 severity levels (CRITICAL, HIGH, MEDIUM, INFO)
    • Coverage for all threat scenarios

Alert Categories:

  1. Graph Traversal Monitoring (2 alerts)

    • SuspiciousGraphTraversal
    • HighFrequencyGraphTraversal
  2. Bulk Data Access (3 alerts)

    • BulkGraphExport (CRITICAL)
    • BulkNodeAccess
    • BulkEdgeAccess
  3. Vector Embedding Monitoring (2 alerts)

    • EmbeddingTheft (HIGH)
    • HighVolumeEmbeddingExport (CRITICAL)
  4. Temporal Query Monitoring (1 alert)

    • SuspiciousTemporalQuery
  5. Anomaly Detection (2 alerts)

    • GraphAnomalyDetected (HIGH)
    • SystematicEnumeration (CRITICAL)
  6. Rate Limiting & Controls (2 alerts)

    • GraphRateLimitExceeded
    • ExportControlViolation (HIGH)
  7. Data Volume & Timing (2 alerts)

    • ExcessiveDataTransfer (CRITICAL)
    • OffHoursGraphAccess

Deployment:

  • Alert rules ready for Prometheus/Grafana integration
  • Requires Prometheus metrics to be exposed (see task 3)

3. Integrate Audit Logging into Operations 🚧 IN PROGRESS

Status: 🚧 Next to implement

Target Files:

  • src/index/graph_index.cpp - Add logging to:

    • bfs(), dijkstra(), astar() methods (GRAPH_TRAVERSAL)
    • outNeighbors(), inNeighbors() bulk calls (BULK_NODE_ACCESS)
    • Export operations (GRAPH_EXPORT)
    • Temporal query methods (TEMPORAL_QUERY)
  • src/index/vector_index.cpp - Add logging to:

    • searchKnn() method (EMBEDDING_QUERY)
    • Bulk embedding retrieval (EMBEDDING_EXPORT)
    • rebuildFromStorage() (potential bulk access)

Implementation Approach:

  1. Add optional AuditLogger* parameter to GraphIndexManager/VectorIndexManager constructors
  2. Log events at key operation points with metadata:
    if (audit_logger_) {
        nlohmann::json details = {
            {"operation", "bfs"},
            {"start_node", start_pk},
            {"depth", max_depth},
            {"nodes_visited", visited.size()}
        };
        audit_logger_->logSecurityEvent(
            SecurityEventType::GRAPH_TRAVERSAL,
            user_id,
            resource_name,
            details
        );
    }
  3. Ensure minimal performance impact (async logging)
  4. Add user_id context tracking

Estimated Effort: 1-2 days

4. Graph-Specific Rate Limits 📋 PLANNED

Status: 📋 Not started

Configuration Integration:

  • Integrate config/graph_protection.yaml with existing rate limiter
  • Add graph-specific limits to RateLimiter or RateLimiterV2
  • Support per-user and per-operation limits

Target Configuration:

graph_protection:
  rate_limits:
    max_traversal_depth: 5
    max_nodes_per_query: 1000
    max_edges_per_query: 10000
    max_embeddings_per_query: 500
    queries_per_minute: 50
    graph_queries_per_minute: 30
    vector_queries_per_minute: 100

Implementation Files:

  • include/server/rate_limiter.h or rate_limiter_v2.h
  • src/server/rate_limiter.cpp
  • Configuration loader integration

Estimated Effort: 1-2 days

5. Export Controls 📋 PLANNED

Status: 📋 Not started

Features:

  • Approval workflow for large exports
  • Maximum export size limits
  • Watermark embedding for exports (optional)

Configuration:

graph_protection:
  export_controls:
    bulk_export_enabled: false
    require_approval: true
    max_export_size_mb: 500
    max_export_nodes: 100000

Estimated Effort: 1 day

6. Testing & Validation 📋 PLANNED

Status: 📋 Not started

Test Cases:

  • Unit tests for new audit events
  • Integration tests for rate limiting
  • Alert rule validation with sample data
  • Performance benchmarks (audit logging overhead)

Estimated Effort: 2 days


Timeline

Task Status Duration Start Date Completion Date
1. Audit Events ✅ Complete 0.5 days Jan 7 Jan 7
2. Prometheus Alerts ✅ Complete 0.5 days Jan 7 Jan 7
3. Integrate Logging 🚧 In Progress 1-2 days Jan 7 Jan 8-9
4. Rate Limits 📋 Planned 1-2 days Jan 9 Jan 10-11
5. Export Controls 📋 Planned 1 day Jan 10 Jan 11
6. Testing 📋 Planned 2 days Jan 11 Jan 13
Total 6-8 days Jan 7 Jan 13-15

Completion Estimate: January 13-15, 2026 (within 2-week target)


Deployment Checklist

Prerequisites

  • Documentation created
  • Configuration example available
  • Audit event types defined
  • Alert rules created

Phase 1 Deployment

  • Audit events committed to codebase
  • Alert rules available in repository
  • Audit logging integrated into operations
  • Rate limits configured and tested
  • Export controls implemented
  • Performance validated (<5% overhead target)
  • Documentation updated with deployment instructions

Production Rollout

  • Enable graph_protection in config
  • Deploy Prometheus alert rules
  • Configure Grafana dashboards
  • Set up alerting channels (email, Slack, PagerDuty)
  • Train operators on alert response procedures
  • Monitor for false positives and tune thresholds

Metrics to Track

Success Criteria

  1. Audit Coverage: 100% of graph/vector operations logged
  2. Alert Response Time: < 1 minute for CRITICAL alerts
  3. False Positive Rate: < 5% for HIGH/CRITICAL alerts
  4. Performance Overhead: < 5% latency impact from audit logging
  5. Detection Rate: > 90% for known attack patterns in testing

Key Performance Indicators (KPIs)

  • Audit events logged per second
  • Alert trigger frequency by severity
  • Average time to detect suspicious activity
  • Average time to respond to alerts
  • Number of false positives per week

Known Issues & Limitations

Current Limitations

  1. No ML-based anomaly detection - Relies on rule-based thresholds (addressed in Phase 3)
  2. No watermarking/fingerprinting - Cannot prove data theft after the fact (Phase 2)
  3. User context tracking - Requires integration with authentication system
  4. Async logging - May miss events if system crashes before flush (acceptable tradeoff)

Future Enhancements (Phase 2/3)

  • Graph watermarking for theft detection
  • Embedding fingerprinting
  • ML-based behavioral analysis
  • Differential privacy for aggregations
  • Advanced threat intelligence integration

Documentation References


Last Updated: April 2026
Author: ThemisDB Security Team
Next Review: January 13, 2026