Role: The Schema Authority, Validation Layer, and Metamodel Governor.
The Ontology Engine is the "Constitutional Court" of ModelKG. It defines the unbreachable laws of physics for the data universe. It exists to solve the fundamental paradox of Graph Databases: Flexibility vs. Integrity.
In a raw Graph Database (like Neo4j), the schema is "schema-on-read," meaning the database will happily accept any garbage data you throw at it. If you write a node with age: "thirty" (string) today and age: 30 (int) tomorrow, the database doesn't care. But your application will crash when it tries to calculate the average age.
ModelKG flips this model to Schema-on-Write. We assert that for a Knowledge Graph to be a "System of Record" rather than a "Dumpster", we must continuously enforce semantic consistency before persistence.
TRADITIONAL (Schema-on-Read) MODELKG (Schema-on-Write)
+------------------------+ +------------------------+
| Any Data (Garbage In) | | Any Data |
+-----------+------------+ +-----------+------------+
| |
v v
+-----------+------------+ +-----------+------------+
| DATABASE | | ONTOLOGY ENGINE |
| {age: "thirty"} | | Checks: isInt(age)? |
| {age: 30} | +-----------+------------+
+-----------+------------+ |
| (Only Valid Data Passes)
v |
+-----------+------------+ v
| APPLICATION CRASH | +-----------+------------+
| avg(age) = Error | | CLEAN GRAPH DB |
+------------------------+ +------------------------+
- Cognitive Offloading: No developer can hold the entire entity relationship model of a large enterprise in their head. The Ontology is living documentation. You don't ask a senior engineer "Can a Server connect to a Business Process?"; you ask the Ontology API.
- Generic Tooling: Because the schema is introspectable, we can build generic UI components (Generic Table, Generic Form, Generic Explorer) that adapt automatically. A frontend form component can query the definition of
Incidentand automatically render a date-picker for theoccurred_atfield without custom code. - Governance & Compliance: By defining "Data Classification" at the schema level (e.g., via a
Confidentialtrait), we ensure that PII rules are applied automatically to every new instance of that type.
The Ontology is composed of three core primitives: Concepts, Relationships, and Traits.
A Concept represents a class of things in the universe. It is analagous to a Class in OOP or a Table in SQL, but more flexible.
ModelKG supports multiple inheritance (Mixins) and hierarchical inheritance.
[ Concept: Asset ]
(Abstract: True)
(Props: asset_id, cost)
^ ^
| | (Inherits)
| |
+--------+ +---------+
| |
[ Concept: Server ] [ Concept: Laptop ]
(Props: cpu, ram) (Props: battery_level)
^
|
[ Concept: DB_Server ]
(Props: storage_type)
- Hierarchical:
DB_Serverpoints toServer.Serverpoints toAsset.- Reasoning: This allows for Liskov Substitution. If a query looks for
MATCH (n:Asset), it successfully returnsDB_Servernodes.
- Reasoning: This allows for Liskov Substitution. If a query looks for
- Abstract Concepts: Concepts can be marked
abstract: true. You cannot create a node that is just anAsset. It must be a concrete implementation likeLaptop.
Properties are not just Key-Values. They are strictly typed descriptors.
- Primitive Types:
string,integer,float,boolean,date,datetime. - Complex Types:
json(for unstructured payloads),point(geospatial). - Enums: Restricted lists of values (e.g.,
Status: ['Draft', 'Active', 'Archived']).- Why? Stringly-typed status fields are a source of constant bugs (e.g.,
In Progressvsin-progressvsIn_Progress). Enums enforce normalization.
- Why? Stringly-typed status fields are a source of constant bugs (e.g.,
Relationships are first-class citizens. In SQL, a relationship is often a hidden "Foreign Key" or a "Join Table". In ModelKG, it is a tangible entity.
[ Source: Person ] --( Relationship: EMPLOYED_BY )--> [ Target: Company ]
|
| Properties:
+-- start_date: Date
+-- role: String
+-- salary_grade: Int
All relationships in ModelKG are directed (Source -> Target). However, the Ontology can declare a relationship as "Semantically Symmetric" (e.g., PEER_OF). Even if stored as A->B, the query engine knows to treat it as bidirectional for logic purposes.
Traits are the superpower of the Ontology Engine. They allow us to standardize behavior across unrelated domains.
Every impactful node should track its history. Instead of defining created_by on 50 different concepts, we define it once in the Auditable trait (Interface) and mix it in.
- Impact: The system automatically validates that user context is present when modifying any node implementing this trait.
Many objects move through states. The Lifecycle trait enforces a status property and creates a "State Machine" constraint.
- Logic: It prevents illegal transitions. e.g., you cannot go from
DrafttoArchivedwithout passing throughActive.
The structure defines what data can exist. The Constraints define what data must (or must not) exist.
We enforce strict cardinality rules on relationships.
(1..1) REQUIRED ONE
[ Task ] ---------------> [ Project ]
(Every task MUST belong to a project)
(0..1) OPTIONAL ONE
[ User ] ---------------> [ Manager ]
(A user might be the CEO, having no manager)
(0..*) MANY
[ Project ] ------------> [ Document ]
(A project can have zero or unlimited documents)
Why this matters: In many systems, "Orphan Nodes" (data disconnected from the main graph) trigger silent failures in reports. ModelKG's 1..* constraint effectively garbage collects or blocks the creation of orphans at the root.
Graph theory allows for dangerous structures like Cycles.
- DAG Enforcement: For dependencies (e.g., Project Tasks, Software Dependencies), the Ontology can enforce a Directed Acyclic Graph (DAG) constraint.
- Mechanism: Before adding edge A->B, the engine runs a lightweight "Shortest Path" check backwards from B->A. If a path exists, adding A->B would close a loop. The write is rejected.
CYCLE PREVENTION LOGIC:
Existing: [ A ] ---> [ B ] ---> [ C ]
Action: User tries to add [ C ] ---> [ A ]
Check: Does Path(A -> ... -> C) exist? -> YES
Result: BLOCK WRITE (Cycle Detected)
Neo4j supports generic constraints, but ModelKG creates Semantic Identity.
- Global ID: UUIDv4 used for system referencing.
- Natural Key: The human-readable identifier (e.g.,
hostnamefor a Server,emailfor a Person). - Scope: The Ontology defines scope.
Project.namemight only need to be unique within a Department, whereasUser.emailmust be globally unique.
How do we perform these checks without slowing the system to a crawl?
We do not query Postgres for every write. The Ontology Engine publishes the compiled schema to Redis and local memory in the Graph Core.
- Versioning: Each schema bundle has a hash. If the Graph Core sees a new hash, it hot-reloads the schema validation rules.
- Syntactic Validation (Fast):
- Input: JSON payload.
- Check: Do fields match types? (Integers are integers).
- Location: In-Memory (Pydantic models generated from Ontology).
- Structural Validation (Medium):
- Input: Source/Target IDs.
- Check: Is
Sourceactually aPerson? IsTargetactually aProject? - Location: Neo4j Index Lookups (very fast).
- Semantic Validation (Slow/Complex):
- Input: The Graph Topology.
- Check: Cycle detection, Max Depth limits.
- Location: Graph Algorithm execution. Note: These are only run for specific Relationship Types flagged as
complex_constraint: true.
In SQL, ALTER TABLE is a nightmare. In Graph, it's nuanced.
We rarely delete fields instantly. We use a 3-phase lifecycle for Schema Changes:
- Active: Property is required and enforced.
- Deprecated: Property is optional, but logs a warning if used. Frontend hides it.
- End-of-Life: Property is rejected.
When a constraint is tightened (e.g., Status becomes required), existing data is invalid.
- The Ontology Engine generates a Compliance Report: "500 nodes missing 'status'."
- It does not allow the Schema Upgrade until the Action Executor runs a Migration Job (e.g., "Set default status = 'Active'").
- Problem: "Role Explosion" - thousands of AD groups.
- Constraint: Segregation of Duties (SoD). A user cannot have multiple paths to conflicting roles.
/--[Role:Approver]--\
[User] [Resource:PaymentSystem]
\--[Role:Requester]-/
LOGIC: Ontology Trait "Segregated" on Roles checks path distinctness.
RESULT: User cannot hold both roles for the same system.
- Problem: Context is vague.
- Ontology Solution:
- Concepts:
Context(e.g., "Deep Work"). - Rule:
Task--BEST_FOR-->Context. - Inference: When the user's phone state is "Driving", the system queries
MATCH (t:Task)-[:BEST_FOR]->(:Context {name: 'Commuting'}).
- Concepts:
[ Phone State: Driving ] ---> [ Context: Commuting ]
^
| (Filter)
[ Task: "Call Mom" ] ----------------+
[ Task: "Code" ] (Excluded: requires 'Deep Work')
Why build a custom engine instead of using W3C standards like OWL or SHACL?
- Complexity: OWL is designed for "Open World" reasoning (inferring truth). We operate in a "Closed World" (validation constraints). We don't want to guess data; we want to enforce it.
- Performance: SHACL validation is computationally expensive to run on every transaction. Our lightweight JSON-schema-based approach allows for O(1) validation complexity for 90% of operations.
- Developer Experience: Asking a modern Full-Stack developer to write RDF/XML is a non-starter. JSON definitions are native to the TypeScript/Python ecosystem.
The Ontology Engine is not just a "checker". It is the Blueprint of Reality for the organization. By investing effort in defining the Ontology, we move complexity out of the Application Code (millions of if statements) and into the Metadata layer, where it is visible, manageable, and enforceable.