Skip to content

Latest commit

 

History

History
255 lines (199 loc) · 11.8 KB

File metadata and controls

255 lines (199 loc) · 11.8 KB

Deep Dive: The Graph Core Service

Role: The "Brain" - Storage Engine, Transaction Manager, and Source of Truth.

1. Executive Summary: The "Managed Graph"

The Graph Core is the central nervous system of ModelKG. It is the only service permitted to write directly to the Neo4j database.

In many graph projects, developers connect their various applications directly to the Graph DB using a generic driver. While fast initially, this leads to chaos at scale:

  1. Inconsistent Logic: One app deletes a node but forgets to delete the connecting edges, leaving "Dangling Pointers."
  2. No Audit Trail: Changes happen without consistent attribution or history.
  3. Fragility: Complicated Cypher queries are scattered across 10 microservices, making schema refactoring a nightmare.
  4. Security Gaps: It is hard to enforce row-level (or node-level) security when every app has root DB access.

ModelKG employs the Managed Graph Pattern. The Graph Core acts as a strictly controlled API facade over the database, ensuring ACID compliance, History Tracking, and Event Emission for every single operation.

        CHAOTIC ACCESS (Bad)                MANAGED ACCESS (ModelKG)
                                        
   [App A] --(Cypher)--> [Neo4j]          [App A] \
                                                   +--> [ GRAPH CORE ]
   [App B] --(Cypher)--> [Neo4j]          [App B] /     Logic & Safety
                                                              |
   [Script]--(Cypher)--> [Neo4j]                          v
                                                       [ NEO4J ]

2. Architecture: The Cypher Abstraction Layer (CAL)

Graph Core does not just pass queries through. It implements an internal Cypher Abstraction Layer (CAL). This Python-based query builder ensures that all generated queries follow best practices.

2.1. Parameterized Queries Only

CAL forbids string concatenation for query building. All values are passed as parameters ( $param ).

  • Security: Impossible to "SQL Inject" (Cypher Inject).
  • Performance: Neo4j can cache the query plan because the query structure remains static even if values change.

2.2. The Transaction Boundary

Every API call is wrapped in a discrete transaction.

  • Atomic Updates: If you send a batch request to Create 50 Nodes, and the 50th one fails validation, all 50 are rolled back.
  • Isolating Noise: Dirty reads are prevented by enforcing strict isolation levels.

3. Data Organization Strategies

ModelKG structures the graph using specific patterns to allow for scale, reusability, and multi-tenancy.

3.1. The Master / Instance Pattern

A fundamental challenge in graph modeling is Data Redundancy. If you manage 50,000 laptops in an enterprise, 95% of their data is identical (Manufacturer: Apple, Model: MacBook Pro M3, Ports: 3x Thunderbolt). Storing these strings 50,000 times is wasteful and makes updating specifications impossible.

The Solution: Separation of Definition (Master) and Implementation (Instance).

          [ Master Node: "MacBook Pro M3" ]
           | label: ProductModel
           | cpu: "M3 Max"
           | ports: ["USBC", "HDMI"]
           | maintenance_guide: "url..."
           |
           ^ (Relationship: INSTANCE_OF)
           |
   +-------+--------------------+---------------------+
   |                            |                     |
[ Instance: UserA_Laptop ]   [ Instance: UserB_Laptop ]
| label: Asset               | label: Asset
| serial: X7129              | serial: Y9912
| location: NY               | location: LDN
| owner: "Alice"             | owner: "Bob"
  • Read-Through Logic (The "Virtual Property"): When you query UserA_Laptop, the Graph Core performs a "Resolve Master" operation.
    • GET /nodes/UserA?resolve_master=true
    • The engine fetches the Instance properties.
    • It checks for an INSTANCE_OF relationship.
    • It merges the Master properties under the Instance properties.
    • Result: {serial: X7129, cpu: "M3 Max", maintenance_guide: "url..."}.
    • Benefit: This is "prototypal inheritance" at the database level.
  • Mass Updates: If Apple recalls the battery, you update the Master Node with recall_active: true. Instantly, all 50,000 instances reflect this status in future API calls and analytics reports.

3.2. The Meta-Graph Overlay

The database contains "System Nodes" that describe the graph itself. These start with an underscore _ to distinguish them from user data.

  • _Concept nodes: A graph representation of the Ontology.
    • Often, the Ontology lives in Postgres, but we mirror it into Neo4j for fast queries.
    • (Node:Server)-[:IS_A]->(_Concept:Server)
    • Allows for super-fast type queries ("Find all Concepts that inherit from Asset").
  • _Tenant nodes: Used for multi-tenancy.
    • Strict Rule: All data nodes must have a BELONGS_TO path to a _Tenant node.
    • This allows physical separation of data on a shared graph.

4. The "Time Machine" (Versioning & History)

Data is rarely static. In a Knowledge Graph, when something was true is often as important as what is true. ModelKG implements a Head/History versioning pattern (Slowly Changing Dimensions Type 2).

We do not overwrite critical data; we append it.

TIMELINE: T1 (Creation)          TIMELINE: T2 (Update Status)

(Head Pointer)                   (Head Pointer MOVES)
      |                                   |
      v                                   v
[ Node: Task-A ]                 [ Node: Task-A_v2 ] ----[:PREVIOUS_VERSION]---> [ Node: Task-A_v1 ]
| status: TODO |                 | status: DONE    |                             | status: TODO    |
| active: true |                 | active: true    |                             | active: false   |
                                 | modified: T2    |                             | valid_until: T2 |

4.1. Write Logic

When updating a node flagged with the Versioned trait:

  1. Lock: Acquire a write lock on the node.
  2. Clone: Copy the existing node entirely to a new node, suffixing the ID or using a dedicated history UUID.
  3. Update: Update the original (Head) node with the new properties.
  4. Link: Create a PREVIOUS_VERSION edge from Head -> Clone.
  5. Timestamp: Set valid_until = NOW() on the clone.

4.2. Read Logic

  • Standard Read: By default, Cypher queries (MATCH (n)) only match the Head node (we filter out history nodes via Label strategies like :History).
  • Time Travel Query: "What was the status of Project X last Tuesday?"
    • The Graph Core functionality traverses the PREVIOUS_VERSION chain.
    • It looks for the node where created_at <= Tuesday AND valid_until > Tuesday.
    • It effectively reconstructs the state of the world at that moment.

5. Connectivity & Traversal Mechanics

The true power of a graph is "the space between the nodes." Graph Core exposes powerful "Graph Algorithms as a Service."

5.1. The Expansion API

A common frontend problem: "I have a User ID. I need their Department, their Projects, and their Manager." In REST, this is 3 calls. In GraphQL, it's one. ModelKG offers a tunable Expansion Endpoint.

Endpoint: GET /api/expand/{node_id}

Parameters:

  • depth (int): How many hops to go? (e.g., depth=2)
  • types (list): Filter on edge types (e.g., types=MEMBER_OF|MANAGES)
  • direction (enum): INCOMING, OUTGOING, or BOTH.

Visual Example:

[ User: Alice ]
      |
      +---(MEMBER_OF)--> [ Team: DevOps ] --(RESPONSIBLE_FOR)--> [ Svc: Payment API ]
      |
      +---(OWNS)--> [ Device: Laptop ] (Ignored if filter excludes OWNS)

Response Format: The API returns a JSON representation of the Subgraph.

{
  "center": "User:Alice",
  "nodes": [ ... list of all found nodes ... ],
  "edges": [ ... list of valid edges ... ]
}

This reduces network chatter significantly.

5.2. Shortest Path & Dependency Tracing

Use Case: A physical Router goes down. What abstract Business Capabilities are affected?

The Graph Core runs a Recursive Dependency Trace:

  1. Start Node: [Asset: Router-X].
  2. Breadth-First Search (BFS): Traverse incoming relationships.
  3. Edge Whitelist: Only follow DEPENDS_ON, HOSTED_ON, POWERED_BY.
  4. Stop Condition: Stop when we hit nodes labeled BusinessCapability.

Result: Router-X <- Server-01 <- Database-Main <- App-Billing <- Capability:Revenue Collection.

  • Conclusion: "Revenue Collection is at Risk."
  • Action: Graph Core returns this chain to the Action Executor to generate an alert.

6. The Event Emission System (The Heartbeat)

Graph Core is not just a database wrapper; it is a Broadcaster. Systems like the Analytics Engine and Action Executor act on changes. They need guaranteed delivery of change events.

6.1. The Transactional Outbox Pattern

If we just "Write to DB" -> "Post to Kafka", we risk dual-write issues. If Kafka is down, the DB write succeeds but the event is lost. Downstream systems become desynchronized.

Solution: The Outbox

  1. Transaction Start.
  2. Write Data: Node changes applied to Neo4j.
  3. Write Event: Event JSON payload is written to a special _Outbox node inside Neo4j within the same transaction.
  4. Commit Transaction (Atomic).
  5. Post-Commit Hook: A dedicated background thread (The Event Publisher) polls Neo4j for _Outbox nodes.
  6. Publish: It pushes the payload to Redpanda.
  7. Cleanup: Upon receiving an ACK from Redpanda, it deletes the _Outbox node.

6.2. Event Payload Schema

We adhere to a strict Event Schema to ensure consumers don't break.

Topic: modelkg.graph.changed

{
  "event_id": "evt-88219-abs-221",
  "trigger": "UserUpdateAPI",
  "timestamp": 1704200000,
  "metadata": {
    "user_id": "admin-alice",
    "correlation_id": "req-123"
  },
  "payload": {
    "operation": "UPDATE_PROPERTY",
    "node_id": "node-555",
    "concept": "Server",
    "labels": ["Server", "Asset"],
    "changes": {
      "status": {
        "old": "ONLINE",
        "new": "OFFLINE"
      }
    }
  }
}

This standardized payload allows downstream systems to filter events efficiently (e.g., "Only wake me up for Server changes").


7. Real-World Use Cases

7.1. Rapid Promotion (Draft -> Master)

Scenario: An engineer models a "Perfect Server Setup" manually as a regular node instance (Server-Temp) and wants to make it a standard availability. Feature: The Graph Core "Promote" endpoint.

  1. User clicks "Promote to Master".
  2. Graph Core validates the node (strips instance-specific data like Serial Numbers).
  3. Graph Core flips labels from Asset to AssetModel (Master).
  4. The node is now available in the catalog for instantation.

7.2. The "De-Duplication" Merge

Scenario: Ingestion brings in "Oracle DB" and "Oracle Database" as two separate nodes, splitting the graph. Feature: Graph Core "Merge" endpoint.

  1. User selects Winner (Oracle DB) and Loser (Oracle Database).
  2. Graph Core Rewires all edges: Anything pointing to Loser is moved to point to Winner.
  3. Graph Core copies unique properties from Loser to Winner (merging history).
  4. Graph Core tags Loser as _Tombstone (soft delete) or deletes it.
  5. Result: The topology is healed without manual edge-by-edge editing.

8. Conclusion

The Graph Core is the heavy lifter of the platform. By abstracting the database, it allows us to implement sophisticated features like Time Travel, Master Templates, and Transactional Eventing that would be impossible to maintain if every microservice wrote raw Cypher queries. It turns Neo4j from a storage engine into a Dynamic, Versioned, and Event-Driven Object Modeling Platform.