An Open-Source Intelligence & Data Operations Platform
Version 0.1 -- February 2026
- Executive Summary
- Understanding the Landscape: What Palantir Does
- Existing Alternatives and Gaps
- Mission Statement
- Core Feature Set
- Architecture Ideas
- What Makes Cestus Different
- Potential Use Cases
- Phased Roadmap
- Existing Open-Source Building Blocks
- Research Sources
Palantir Technologies has built a multi-billion dollar business around a deceptively simple insight: organizations drown in data not because they lack storage or compute, but because they lack a unified semantic layer that connects disparate data sources into a coherent model of reality -- and then lets humans and AI agents act on that model.
Palantir's moat is not any single algorithm. It is the integration -- the ontology layer that turns raw tables into entities and relationships, the visualization tools that let analysts explore those relationships across graphs, maps, and timelines, the action framework that lets decisions flow back into operational systems, and the deployment machinery that makes all of this work in sensitive, air-gapped environments.
No single open-source project replicates this today. But the building blocks exist. Cestus is a vision for composing those building blocks into a coherent, open-source platform that delivers Palantir-class capabilities to organizations that cannot afford Palantir, do not want vendor lock-in, or need the transparency that only open source provides.
Gotham is Palantir's original product, built for intelligence agencies and military organizations. Its core capabilities include:
- Data Integration: Connectors to structured sources (databases, ERP, CRM), semi-structured (logs, XML), and unstructured (PDFs, emails, imagery), with heavy investment in deduplication and entity resolution.
- Ontology / Entity Graph: A dynamic ontology that tags entities (persons, phone numbers, addresses, organizations, assets) and places links between them. This is the "digital twin" of the intelligence domain.
- Link Analysis (Graph): A network analysis canvas where analysts create visual representations of networked data. Users can view aggregated property statistics, organize and style graphs, and annotate them as part of collaborative workflows.
- Geospatial Analysis: Map layers showing entity locations, event timelines, and movement patterns.
- Object Explorer: Top-down analysis enabling users to find entities with similar characteristics and visualize relationships across millions of records.
- Collaboration: Real-time concurrent analysis within Graph and other applications. Shared canvases, annotations, and presentation workflows.
Foundry is the commercial platform -- a "data operating system" for enterprises:
- Data Integration: Pipeline-based ingestion from any source, with data transformation and cleaning.
- Ontology Layer: The signature feature. The Foundry Ontology sits on top of datasets and models, connecting them to their real-world counterparts (factories, equipment, products, orders, transactions). It contains both semantic elements (objects, properties, links) and kinetic elements (actions, functions, dynamic security).
- Ontology Architecture: A microservices backend with the Ontology Metadata Service (OMS) defining object types, link types, and action types; the Object Data Funnel orchestrating data writes and indexing; and Object Storage V2 separating indexing from querying for horizontal scalability.
- Application Building: Low-code/no-code tools for building operational applications on top of the ontology.
- Workshop & Quiver: Drag-and-drop application builders for dashboards and operational workflows.
AIP integrates LLMs and AI agents into the ontology:
- Ontology-Grounded AI: AI agents reason over the ontology's entities, relationships, and business logic rather than raw data. The ontology provides "the nouns" (entities) and "the verbs" (actions) of the enterprise.
- AIP Agent Studio: Build, test, and deploy AI agents that can read from and write to the ontology. Agents are sandboxed with specific permissions on data and tools.
- AIP Logic: A no-code environment for building LLM-powered functions that leverage the ontology.
- Agents as Functions: Agents can be published as Functions, making them composable and reusable across the platform.
Apollo is the deployment and operations layer:
- Hub and Spoke Architecture: A central Apollo Hub manages multiple Spoke environments, each running a Spoke Control Plane that reports telemetry and executes deployment plans.
- Pull-Based Deployment: Instead of pushing code, environments pull updates via subscriptions to Release Channels.
- Air-Gapped Support: Manages software across connected and disconnected environments, critical for defense and regulated industries.
- Compliance-Aware: Built-in controls for FedRAMP, IL5, IL6 accreditation frameworks.
The real power is not any individual product but their integration:
- Data goes in (any format, any source)
- Ontology maps it to real-world entities and relationships
- Humans explore it via graphs, maps, timelines, dashboards
- AI reasons over it grounded in the ontology
- Actions flow back into operational systems
- Apollo deploys it anywhere, including air-gapped environments
- Security governs it at every layer with fine-grained access control
| Product | Strengths | Gaps vs. Palantir |
|---|---|---|
| Databricks | Unified analytics, Delta Lake, MLflow | No ontology layer, no link analysis, no investigative UI |
| Snowflake | Data warehousing, data sharing | Pure storage/compute, no semantic layer |
| Dataiku | End-to-end data science | Weaker on ontology and operational applications |
| d.AP (digetiers) | Ontology-grounded on RDF/OWL open standards | Newer, smaller ecosystem |
| DataWalk | Link analysis, investigative analysis | Proprietary, narrower scope |
| Siren | Investigative intelligence, link analysis | Proprietary, Elasticsearch-based |
| C3 AI | Enterprise AI applications | Proprietary, expensive |
What exists today:
- Data Integration/ETL: Apache Airflow, Apache NiFi, Apache Hop, Apache Kafka, Apache Beam, Airbyte, dbt
- Data Catalogs/Metadata: OpenMetadata, DataHub (LinkedIn), Amundsen (Lyft), Apache Atlas
- Graph Databases: Neo4j (Community Edition), JanusGraph, Apache TinkerPop, Apache AGE (Postgres extension)
- Entity Resolution: Zingg (ML-based), Splink (probabilistic), Dedupe (Python)
- Knowledge Graphs: WhyHow Knowledge Graph Studio, Graphiti, KBpedia
- Visualization: Apache Superset (dashboards), Grafana (monitoring), Kepler.gl (geospatial), Gephi (graph), Sigma.js (graph/web)
- AI/LLM Frameworks: LangChain, LlamaIndex, LangGraph, CrewAI, Dify
- Authorization: Casbin, Ory Keto, Permify (Zanzibar-inspired RBAC)
- Deployment: ArgoCD, Flux, Kubernetes
The critical gap: No single open-source project or composition of projects provides the ontology-as-operating-system experience -- the unified semantic layer that sits between raw data and applications/AI, with integrated entity resolution, link analysis visualization, and action frameworks. Projects exist in silos. The integration is the product, and that integration does not exist in open source.
- Dashjoin: An open-source low-code platform that establishes a linked data graph over data sources with browsing, searching, editing, AI integration, and GitOps delivery. The closest thing to an integrated Palantir alternative in open source, but significantly smaller in scope and community.
- Apache Atlas + JanusGraph + Superset: A common open-source stack for metadata governance and visualization, but lacks the ontology-driven application layer and investigative UI.
Organizations of all sizes face the same fundamental challenge that Palantir solves for the world's largest governments and enterprises: their data is fragmented across dozens of systems, in incompatible formats, with no unified way to understand what the data represents in the real world, explore relationships, or act on insights.
Today, solving this problem requires either (a) paying millions for Palantir, (b) assembling a bespoke stack from dozens of open-source tools with no integration layer, or (c) going without.
Cestus is a free, open-source data operations platform that unifies data integration, semantic modeling, entity resolution, visual analysis, AI reasoning, and operational action into a single coherent system.
Cestus makes it possible for any organization -- investigative journalists, humanitarian NGOs, academic researchers, mid-sized enterprises, local governments, open-source intelligence analysts -- to turn fragmented data into an entity-relationship model of their domain, explore it visually, reason over it with AI, and take action.
- Open Source, Always: Apache 2.0 or similar permissive license. No open-core bait-and-switch.
- Ontology-First: The semantic model is the core abstraction. Everything else -- ingestion, visualization, AI, actions -- operates through the ontology.
- Composable: Built as a set of well-defined services with clean APIs. Use the whole platform or individual components.
- AI-Native: LLM and agent integration is not an afterthought -- it is a first-class capability grounded in the ontology.
- Security by Design: Fine-grained access control (RBAC + ABAC), audit logging, and data provenance from day one.
- Deploy Anywhere: Cloud, on-premise, air-gapped, edge. Kubernetes-native with support for disconnected environments.
Goal: Connect to any data source and bring data into the platform with minimal friction.
| Capability | Description | Priority |
|---|---|---|
| Connectors | Pre-built connectors for databases (Postgres, MySQL, SQL Server, Oracle), cloud storage (S3, GCS, Azure Blob), APIs (REST, GraphQL), files (CSV, JSON, Parquet, XML), messaging (Kafka, RabbitMQ) | MVP |
| Custom Connectors | SDK for building custom connectors | v1.0 |
| Stream Ingestion | Real-time data ingestion from streaming sources | v1.0 |
| Data Transformation | Pipeline-based transformation with versioning | MVP |
| Incremental Sync | Change data capture and incremental updates | v1.0 |
| Unstructured Ingestion | PDF, email, document, and image ingestion with AI-powered extraction | v1.5 |
Goal: Provide a semantic layer that maps raw data to real-world entities and relationships.
This is the heart of Cestus and the primary differentiator from "just another data tool."
| Capability | Description | Priority |
|---|---|---|
| Object Types | Define entity types (Person, Organization, Vehicle, Transaction, Event, etc.) with typed properties | MVP |
| Link Types | Define relationship types between object types (employs, owns, communicated_with, located_at) with properties | MVP |
| Ontology Editor | Visual and code-based tools for defining and editing the ontology schema | MVP |
| Data Mapping | Map raw dataset columns to ontology object properties, with transformation rules | MVP |
| Entity Resolution | ML-assisted deduplication and entity resolution across data sources | MVP |
| Interface Types | Polymorphic interfaces (like Palantir's) for consistent modeling across object types that share common shapes | v1.0 |
| Ontology Versioning | Schema versioning with migration support | v1.0 |
| Derived Properties | Computed properties based on linked objects, aggregations, or functions | v1.0 |
| Temporal Modeling | First-class support for time-varying properties and historical states | v1.5 |
Ontology Data Model (Conceptual):
ObjectType
- id: UUID
- name: string (e.g., "Person", "Organization")
- properties: PropertyDefinition[]
- interfaces: InterfaceType[]
- datasource_mappings: DataSourceMapping[]
LinkType
- id: UUID
- name: string (e.g., "employed_by", "called")
- source_type: ObjectType
- target_type: ObjectType
- properties: PropertyDefinition[]
- cardinality: ONE_TO_ONE | ONE_TO_MANY | MANY_TO_MANY
Object (instance)
- id: UUID
- type: ObjectType
- properties: { [key]: value }
- provenance: DataSource[] -- which sources contributed to this object
- confidence: float -- entity resolution confidence
- timestamps: { created, modified, valid_from, valid_to }
Link (instance)
- id: UUID
- type: LinkType
- source: Object
- target: Object
- properties: { [key]: value }
- provenance: DataSource[]
Goal: Find any entity, relationship, or pattern across the entire ontology.
| Capability | Description | Priority |
|---|---|---|
| Full-Text Search | Search across all object properties with relevance ranking | MVP |
| Faceted Search | Filter by object type, property values, date ranges, data source | MVP |
| Graph Traversal Search | "Find all entities within N hops of entity X" | MVP |
| Saved Searches | Save and share search queries | v1.0 |
| Natural Language Search | AI-powered "ask a question in plain English" search | v1.0 |
| Pattern Search | Find subgraph patterns (e.g., "person connected to organization through phone number") | v1.5 |
Goal: Multiple visual paradigms for exploring the ontology -- because different questions require different views.
| Capability | Description | Priority |
|---|---|---|
| Graph View | Interactive link analysis canvas. Expand entities, explore connections, filter, cluster, style nodes/edges. The primary investigative interface. | MVP |
| Table View | Spreadsheet-like view of object collections with sorting, filtering, grouping | MVP |
| Map View | Geospatial visualization of entities with location properties. Layers, clustering, heatmaps. | MVP |
| Timeline View | Temporal visualization of events and entity activity | v1.0 |
| Dashboard Builder | Drag-and-drop dashboard composition from charts, tables, maps, and graphs | v1.0 |
| Object Profile | Detailed view of a single entity with all properties, linked entities, activity timeline, and source provenance | MVP |
| Histogram / Charts | Bar, line, pie, scatter, and other statistical visualizations over ontology data | v1.0 |
| Notebook Integration | Jupyter-style notebook for ad-hoc analysis with access to ontology APIs | v1.5 |
Goal: Multiple users working together on investigations, analyses, and operational workflows.
| Capability | Description | Priority |
|---|---|---|
| Workspaces | Shared project spaces for team collaboration | MVP |
| Annotations | Add notes, tags, and assessments to any entity or relationship | MVP |
| Canvas Sharing | Share and co-edit graph, map, and timeline canvases | v1.0 |
| Comments | Threaded comments on any object, link, or analysis artifact | v1.0 |
| Activity Feed | See what teammates have been exploring, annotating, or modifying | v1.0 |
| Audit Trail | Full history of who viewed, modified, or exported what data | MVP |
Goal: AI agents that can reason over the ontology, answer questions, and take actions -- grounded in real data, not hallucinations.
| Capability | Description | Priority |
|---|---|---|
| Ontology-Grounded RAG | LLM queries answered using ontology entities and relationships as context | MVP |
| Natural Language Query | "Show me all transactions over $10K between Company A and any entity flagged as high-risk" | v1.0 |
| Entity Extraction (NER) | AI-powered extraction of entities and relationships from unstructured text | v1.0 |
| AI Agent Framework | Agents with tool-calling that can search the ontology, traverse graphs, create annotations, and suggest actions | v1.0 |
| Anomaly Detection | ML models that identify unusual patterns in entity behavior or relationships | v1.5 |
| Agent Sandboxing | Fine-grained permissions for what data and actions agents can access | v1.0 |
| Model Registry | Register and version ML models, connect outputs to ontology | v1.5 |
| Bring Your Own LLM | Support for OpenAI, Anthropic, local models (Ollama, vLLM), or any OpenAI-compatible API | MVP |
Goal: Move from insight to action -- write changes back to source systems, trigger workflows, and automate operational responses.
| Capability | Description | Priority |
|---|---|---|
| Action Types | Define typed actions (approve, escalate, flag, update, notify) with input/output schemas | v1.0 |
| Action Execution | Execute actions that write back to source systems via connectors | v1.0 |
| Workflow Engine | Multi-step workflows triggered by events, schedules, or human decisions | v1.5 |
| Webhooks | Outbound webhooks on ontology events (entity created, link added, property changed) | v1.0 |
| Notifications | Alerts and notifications based on ontology events or AI agent findings | v1.0 |
Goal: Enterprise-grade security that operates at the ontology level, not just the data level.
| Capability | Description | Priority |
|---|---|---|
| Authentication | SSO (SAML, OIDC), local accounts, API keys | MVP |
| RBAC | Role-based access control (admin, analyst, viewer, etc.) | MVP |
| Object-Level Permissions | Control who can see/edit specific object types | MVP |
| Property-Level Masking | Mask sensitive properties (SSN, financial data) based on role | v1.0 |
| Row-Level Security | Filter visible objects based on user attributes (department, clearance, geography) | v1.0 |
| Marking/Classification | Apply classification markings to data and enforce handling rules | v1.5 |
| Audit Logging | Immutable audit log of all data access, modifications, and exports | MVP |
| Data Provenance | Track which source contributed each property value, with lineage | MVP |
+------------------------------------------------------------------+
| Cestus Platform |
+------------------------------------------------------------------+
| |
| +-------------------+ +-------------------+ +----------------+ |
| | Web UI (SPA) | | CLI / SDK | | REST/GraphQL | |
| | React + D3/ | | Python / TS | | API Gateway | |
| | Sigma.js/Deck.gl| | | | | |
| +--------+----------+ +--------+----------+ +-------+--------+ |
| | | | |
| +--------v-----------------------v----------------------v------+ |
| | API Layer (Gateway) | |
| | Authentication | Rate Limiting | Routing | |
| +---+----------+----------+----------+----------+-----------+--+ |
| | | | | | | |
| +---v---+ +---v----+ +--v----+ +---v----+ +---v-----+ +--v---+ |
| |Ontology| |Search | |Visual | | AI | |Actions | |Auth | |
| |Service | |Service | |Service| |Service | |Service | |Svc | |
| +---+---+ +---+----+ +--+----+ +---+----+ +---+-----+ +--+---+ |
| | | | | | | |
| +---v---------v----------v----------v----------v-----------v---+ |
| | Ontology Storage Layer | |
| | +-------------+ +------------+ +-----------------------+ | |
| | | Graph DB | | Search | | Object/Relational | | |
| | | (Neo4j/ | | (Elastic/ | | Store (Postgres) | | |
| | | JanusGraph) | | Typesense)| | | | |
| | +-------------+ +------------+ +-----------------------+ | |
| +--------------------------------------------------------------+ |
| |
| +--------------------------------------------------------------+ |
| | Data Integration Layer | |
| | +----------+ +----------+ +----------+ +-------------+ | |
| | | Connectors| | Transform| | Entity | | Pipeline | | |
| | | (Airbyte) | | (dbt) | | Resolver | | Orchestrator| | |
| | | | | | | (Zingg/ | | (Airflow/ | | |
| | | | | | | Splink) | | Temporal) | | |
| | +----------+ +----------+ +----------+ +-------------+ | |
| +--------------------------------------------------------------+ |
| |
| +--------------------------------------------------------------+ |
| | Infrastructure Layer | |
| | Kubernetes | Helm Charts | Monitoring (Prometheus/Grafana) | |
| | Object Storage (MinIO/S3) | Message Queue (Kafka/NATS) | |
| +--------------------------------------------------------------+ |
+------------------------------------------------------------------+
Decision 1: Ontology Storage -- Polyglot Persistence
The ontology needs to be queryable in multiple ways simultaneously:
- Graph traversals (follow relationships N hops) --> Graph database
- Full-text search (find entities by keyword) --> Search engine
- Aggregations and analytics (count, sum, group) --> Relational/columnar database
- High-volume writes (ingest pipelines) --> Write-optimized store
Rather than picking one database, use a polyglot persistence approach with an Ontology Storage Layer that writes to multiple backends and keeps them in sync via an event bus:
- PostgreSQL (with Apache AGE extension for graph queries): Primary store of record for object and link instances. Proven, reliable, rich ecosystem. AGE adds Cypher query support directly in Postgres.
- Elasticsearch or Typesense: Full-text search index, kept in sync via change data capture.
- Neo4j Community Edition or JanusGraph: Dedicated graph store for deep traversal queries. Neo4j CE for simpler deployments; JanusGraph for distributed scale.
Decision 2: Ontology Service -- The Central Nervous System
A dedicated Ontology Service acts as the single API for reading and writing ontology data. All other services (search, visualization, AI, actions) interact with the ontology through this service. This ensures:
- Consistent schema enforcement
- Centralized access control
- Event emission for all mutations (enabling downstream sync and audit)
Decision 3: Event-Driven Architecture
All ontology mutations produce events on a message bus (Kafka or NATS):
object.created,object.updated,object.deletedlink.created,link.updated,link.deletedaction.executed
This enables:
- Search index updates
- Graph database sync
- Audit logging
- Webhook delivery
- AI agent triggers
- Real-time UI updates via WebSockets
Decision 4: Frontend -- React with Specialized Visualization Libraries
- Core Framework: React (TypeScript)
- Graph Visualization: Sigma.js (WebGL-based, handles large graphs) or Cytoscape.js
- Geospatial: Deck.gl (WebGL, large-scale) or Leaflet (simpler)
- Charts/Dashboards: Apache ECharts or Recharts
- Timeline: vis-timeline or custom D3-based
- Layout: A workspace model (like VS Code) where users can arrange panels
Decision 5: AI Integration -- Plugin Architecture
Rather than hardcoding LLM providers:
- Define an LLM Provider Interface that abstracts model calls
- Ship adapters for OpenAI, Anthropic, Ollama (local), vLLM
- Use LangChain or LlamaIndex internally for RAG pipeline
- Agents use a Tool abstraction that maps to ontology operations (search, traverse, annotate, execute action)
| Layer | Technology | License | Rationale |
|---|---|---|---|
| Frontend | React + TypeScript | MIT | Industry standard, massive ecosystem |
| Graph Viz | Sigma.js | MIT | WebGL performance for large graphs |
| Geo Viz | Deck.gl | MIT | High-performance geospatial |
| Charts | Apache ECharts | Apache 2.0 | Rich chart types, good performance |
| API Gateway | Kong or Traefik | Apache 2.0 | API management, auth, rate limiting |
| Backend Services | Python (FastAPI) or Go | MIT / BSD | FastAPI for rapid development; Go for performance-critical services |
| Primary DB | PostgreSQL + Apache AGE | PostgreSQL / Apache 2.0 | Relational + graph in one, proven at scale |
| Search | Typesense or Elasticsearch | GPL-3 / SSPL | Typesense is simpler and truly open; Elasticsearch has larger ecosystem |
| Graph DB (optional) | Neo4j CE or JanusGraph | GPL-3 / Apache 2.0 | Deep traversal queries; optional if AGE suffices |
| Message Bus | NATS or Apache Kafka | Apache 2.0 | NATS for simplicity; Kafka for scale |
| Object Storage | MinIO | AGPL-3.0 | S3-compatible, for documents and files |
| Pipeline Orchestration | Apache Airflow or Temporal | Apache 2.0 / MIT | Airflow for batch; Temporal for event-driven workflows |
| Data Connectors | Airbyte | MIT (Elv2 for some) | 300+ pre-built connectors |
| Entity Resolution | Zingg or Splink | AGPL-3.0 / MIT | ML-based dedup and entity resolution |
| AI/RAG | LangChain + LlamaIndex | MIT | RAG pipeline and agent framework |
| Auth | Keycloak + Casbin | Apache 2.0 | SSO + fine-grained policy engine |
| Deployment | Kubernetes + Helm | Apache 2.0 | Standard cloud-native deployment |
| Monitoring | Prometheus + Grafana | Apache 2.0 | Observability |
| Dimension | Palantir | Cestus |
|---|---|---|
| Cost | Millions per year | Free (self-hosted) |
| Transparency | Proprietary black box | Full source code visibility |
| Vendor Lock-in | Extreme -- data model tied to platform | Open formats, standard APIs, portable ontology |
| Customization | Services engagement required | Fork it, extend it, contribute back |
| Community | Palantir employees only | Open contributor community |
| AI Models | Palantir-selected models | Bring your own -- local, cloud, or any provider |
| Deployment | Palantir-managed | Self-managed with Helm charts, or managed by community providers |
| Dimension | Typical OSS Stack | Cestus |
|---|---|---|
| Integration | Assemble 10+ tools yourself, build glue code | Integrated platform with shared ontology |
| Ontology | Each tool has its own data model | Single semantic ontology layer across all features |
| Entity Resolution | Run separately, reconcile manually | Built-in, continuous ER feeding the ontology |
| Visualization | Superset for charts, Gephi for graphs, Kepler for maps -- disconnected | Unified workspace with graph, map, timeline, charts sharing one ontology |
| AI Grounding | RAG over raw data | RAG over the ontology -- entities and relationships, not raw tables |
| Access Control | Bolt-on per tool | Ontology-level security that governs all views |
Cestus's differentiator is the ontology as the universal API. Every feature -- ingestion, search, visualization, AI, actions, security -- speaks the language of entities and relationships, not tables and columns. This is what makes Palantir powerful, and it is what no open-source project currently provides as a unified, integrated experience.
Journalists investigating financial crime, political corruption, or corporate misconduct need to connect entities (people, companies, addresses, bank accounts) from leaked documents, public records, and proprietary databases. Cestus would provide the graph analysis, entity resolution, and document ingestion to do this -- capabilities currently available only through expensive tools or manual effort.
Example: A newsroom integrates Panama Papers data, corporate registries, and political donation records. Cestus resolves entities across sources and reveals hidden ownership networks.
Organizations like the UNHCR, Red Cross, or Doctors Without Borders manage operations across fragmented data systems -- beneficiary registries, supply chain databases, field reports, geospatial data. Cestus could unify this into a coherent operational picture.
Example: An NGO integrates refugee registration data, supply depot inventories, and field incident reports to optimize resource allocation and identify underserved areas.
OSINT analysts -- whether in journalism, civil society, or academic research -- need to collect, structure, and analyze publicly available information. Cestus would provide the entity resolution, link analysis, and geospatial tools that are currently locked in expensive proprietary platforms.
Example: Researchers tracking the spread of disinformation map social media accounts, websites, and funding sources to reveal coordinated influence networks.
Researchers studying complex systems (epidemiology, climate, social networks, supply chains) need to integrate diverse datasets and explore relationships. Cestus's ontology and visualization tools would serve as a research platform.
Example: Epidemiologists integrate hospital records, genomic data, and mobility data to model disease transmission networks.
Mid-sized companies that cannot afford Palantir but need to connect their CRM, ERP, supply chain, and financial systems into a coherent view. Cestus could serve as the "data operating system" for companies with 100-10,000 employees.
Example: A manufacturing company connects their ERP, IoT sensor data, supplier database, and quality control system to get end-to-end visibility into production issues.
City and county governments managing public safety, infrastructure, permits, and social services across disconnected systems. Cestus could provide the unified view that large federal agencies get from Palantir.
Example: A city government integrates building permits, code violations, fire inspections, and 311 complaints to identify properties that pose safety risks.
Financial institutions, insurance companies, and regulatory bodies that need to detect complex fraud patterns across entity networks.
Example: A credit union integrates transaction data, account records, and external watchlists. Cestus's entity resolution links related accounts, and graph analysis reveals suspicious transaction patterns.
Goal: Core infrastructure and ontology service that proves the concept.
- Project scaffolding: monorepo, CI/CD, contribution guidelines, governance
- Ontology Service: Core API for defining object types, link types, and properties
- PostgreSQL + Apache AGE storage backend for objects, links, and graph queries
- Basic data ingestion: CSV and JSON file upload, manual data entry
- Basic entity resolution: Rule-based deduplication (exact match, fuzzy match on key fields)
- REST API: Full CRUD on ontology schema and instances
- Authentication: Basic auth and API keys (Keycloak integration in Phase 1)
- Minimal Web UI: Object type browser, entity list view, single entity detail page
Deliverable: A working ontology service with API, basic ingestion, and a minimal UI that lets you define a schema, import data, resolve entities, and browse entities and their relationships.
Success Criteria:
- Schema CRUD: Define ontology schema (>=3 object types, >=2 link types) via REST API. Minimum: less than 2s p99. Target: less than 500ms p99.
- Ingest benchmark: Import entities from CSV. Minimum: 1K entities in less than 5 min. Target: 10K entities in less than 5 min on a 4-core/16GB machine.
- Entity resolution: Rule-based dedup on reference dataset. Minimum: >=80% exact-match. Target: >=90% exact + >=70% fuzzy.
- Session durability: Every turn writes v2-conformant
turns.jsonl. Minimum: interrupted sessions lose <=2 turns. Target: <=1 turn loss (verified by kill-test). - Cross-runtime compat: Both Python and desktop can read each other's sessions. Minimum: >=80% of fixture matrix. Target: 100%.
Goal: The investigative experience -- graph, map, search, and collaboration.
- Graph Visualization: Interactive link analysis canvas (Sigma.js-based)
- Expand/collapse nodes, filter by type, layout algorithms, styling
- Select entity in list view --> "Explore in Graph"
- Map View: Geospatial visualization for entities with location properties (Deck.gl)
- Full-Text Search: Elasticsearch/Typesense integration with faceted search
- Object Profile Page: Comprehensive entity view with properties, links, timeline, provenance
- Keycloak SSO Integration: SAML/OIDC authentication
- RBAC: Role-based access control at the object type level
- Workspace & Annotations: Shared workspaces, entity annotations, tagging
- Audit Logging: Immutable log of all data access and modifications
- Connector SDK: Framework for building data source connectors
- First connectors: PostgreSQL, MySQL, REST API, S3/file system
Deliverable: An analyst can ingest data from multiple sources, search across entities, explore relationships in a graph canvas, view entities on a map, and collaborate with teammates in shared workspaces.
Success Criteria:
- Graph performance: Expand node to 3 hops, filter 2+ types, >=500 entities. Minimum: >=15fps. Target: >=30fps.
- Search recall: Top-5 results for test suite. Minimum: >=60% of 50 queries. Target: >=80%.
- Provenance drill-down: Revelation cards resolve to evidence. Minimum: >=50% of cards have >=1 drill-down link. Target: 100%.
- Connector SDK: Working connectors pass ingestion test. Minimum: 2 connectors. Target: 3+ (PostgreSQL, CSV, REST API).
- Audit completeness: Minimum: >=90% coverage. Target: 100% of data operations logged.
Goal: AI-powered analysis and richer visualization.
- LLM Integration: Configurable LLM provider (OpenAI, Anthropic, Ollama)
- Ontology-Grounded RAG: "Ask a question" interface that retrieves relevant entities/links as context
- Natural Language Query: Convert natural language to ontology queries
- Entity Extraction (NER): Extract entities and relationships from unstructured text documents
- ML Entity Resolution: Zingg or Splink integration for probabilistic entity resolution
- Timeline View: Temporal visualization of entity activity and events
- Dashboard Builder: Drag-and-drop composition of charts, tables, maps, and graphs
- Row-Level Security: Filter visible objects based on user attributes
- Pipeline Orchestration: Airflow/Temporal integration for scheduled ingestion pipelines
- Airbyte Integration: Access to 300+ data source connectors
- Webhook System: Outbound webhooks on ontology events
Deliverable: An analyst can ask questions in natural language, have AI agents traverse the ontology to find answers, ingest and extract entities from documents, build dashboards, and set up automated data pipelines.
Success Criteria:
- RAG accuracy: Ontology-grounded answers (human eval). Minimum: >=50% correct, <=10% hallucination. Target: >=70% correct, <=5% hallucination.
- NL query translation: Valid ontology queries. Minimum: >=40% of 40-query suite. Target: >=60%.
- Entity extraction: NER F1 on reference corpus. Minimum: >=60% F1. Target: >=75% F1.
- Dashboard builder: Non-technical user creates 3-widget dashboard. Minimum: less than 20 min. Target: less than 10 min.
- Pipeline reliability: 100 consecutive runs. Minimum: <=5% failure. Target: <=2% failure.
Goal: Transform from an analytical platform into an operational one.
- Action Framework: Define typed actions that write back to source systems
- AI Agent Studio: Build and deploy agents that can read/write the ontology with tool-calling
- Agent Sandboxing: Fine-grained permissions for AI agents
- Workflow Engine: Multi-step, event-triggered workflows
- Notification System: Alerts based on ontology events or anomalies
- Pattern Search: Find subgraph patterns across the ontology
- Anomaly Detection: ML-based detection of unusual patterns
- Property-Level Masking: Column-level security with data masking
- Classification Markings: Data classification and handling rules
- Notebook Integration: Jupyter notebook with ontology SDK
Deliverable: The platform is operational -- insights lead to actions, workflows automate responses, AI agents work alongside human analysts, and the security model supports sensitive use cases.
Success Criteria:
- Action round-trip: Flag entity, write-back reflected in ontology. Minimum: less than 60s. Target: less than 30s.
- Agent sandboxing: Restricted agent cannot access out-of-scope entities. Minimum: 0 escapes in 10-op test. Target: 0 escapes in 20-op test.
- Workflow reliability: 5-step event-triggered workflow. Minimum: >=90% over 50 runs. Target: >=95%.
- Pattern search: Subgraph pattern on >=100K entities. Minimum: less than 10s. Target: less than 5s.
Goal: Production hardening, deployment flexibility, and community ecosystem.
- Horizontal Scaling: Sharded ontology storage for very large datasets
- Air-Gapped Deployment: Full functionality without internet access
- Edge Deployment: Lightweight deployment for resource-constrained environments
- Plugin Marketplace: Community-contributed connectors, visualizations, AI agents
- Multi-Tenancy: Shared infrastructure with isolated ontologies
- Federated Ontologies: Connect multiple Cestus instances while respecting access boundaries
- Compliance Frameworks: FedRAMP, SOC 2, GDPR compliance tooling
- Mobile UI: Responsive interface for field use
- Real-Time Collaboration: Google Docs-style concurrent editing of canvases and annotations
Success Criteria:
- Horizontal scale: 10M+ entities, 3-hop traversal. Minimum: less than 5s p95. Target: less than 2s p95.
- Air-gapped: Full function without network. Minimum: 24h. Target: 72h+.
- Multi-tenancy isolation: Cross-tenant query provably blocked. Minimum: verified by test suite.
The following operational metrics are tracked continuously across all phases:
- Session contract conformance rate: % of sessions with valid v2 metadata
- Provenance coverage: % of revelation cards with >=1 drill-down target
- Cross-runtime compatibility score: % of test matrix passing
- Mean time to evidence: Seconds from overview card click to evidence display
One of Cestus's strategic advantages is that it does not need to build everything from scratch. The following projects can serve as foundations:
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| Airbyte | 300+ pre-built data connectors | Data ingestion from any source | Elv2 (core) / MIT |
| Apache Airflow | Workflow orchestration, DAGs, scheduling | Pipeline orchestration for batch ingestion | Apache 2.0 |
| Temporal | Durable execution, event-driven workflows | Action execution and operational workflows | MIT |
| dbt | SQL-based data transformation | Transform raw data before ontology mapping | Apache 2.0 |
| Apache Kafka / NATS | Event streaming, message bus | Internal event bus for ontology mutations | Apache 2.0 |
| Apache NiFi | Data flow management, visual pipeline builder | Alternative/complement to Airflow for stream processing | Apache 2.0 |
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| PostgreSQL | Relational database | Primary store for ontology objects and metadata | PostgreSQL |
| Apache AGE | Graph query extension for PostgreSQL | Graph traversal queries within Postgres | Apache 2.0 |
| Neo4j Community | Native graph database | Deep traversal queries (optional) | GPL-3.0 |
| JanusGraph | Distributed graph database | Large-scale graph workloads (alternative to Neo4j) | Apache 2.0 |
| Elasticsearch | Search engine | Full-text search and faceted filtering | SSPL |
| Typesense | Search engine (simpler, truly open) | Full-text search (alternative to ES) | GPL-3.0 |
| MinIO | S3-compatible object storage | Document and file storage | AGPL-3.0 |
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| Zingg | ML-based entity resolution at scale | Deduplication and entity matching across sources | AGPL-3.0 |
| Splink | Probabilistic record linkage | Scalable entity resolution (Python, multiple backends) | MIT |
| Dedupe | Python entity resolution library | Lightweight ER for smaller datasets | MIT |
| WhyHow KG Studio | Knowledge graph construction with entity resolution | Reference architecture for ontology management | MIT |
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| Sigma.js | WebGL graph rendering for the web | Link analysis / graph exploration canvas | MIT |
| Cytoscape.js | Graph theory library for visualization | Alternative graph renderer with rich layout algorithms | MIT |
| Deck.gl | WebGL-powered large-scale geospatial visualization | Map view for entities with location data | MIT |
| Leaflet | Lightweight interactive maps | Simpler geospatial view (alternative to Deck.gl) | BSD-2 |
| Apache ECharts | Rich charting library | Dashboard charts and statistical visualizations | Apache 2.0 |
| vis-timeline | Interactive timeline visualization | Timeline view for temporal entity data | MIT/Apache 2.0 |
| Gephi | Desktop graph analysis tool (reference) | Architectural inspiration for graph analysis features | GPL |
| Apache Superset | Dashboard and visualization platform | Reference architecture; possible embed for dashboarding | Apache 2.0 |
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| LangChain | LLM application framework | RAG pipeline, agent framework, tool integration | MIT |
| LlamaIndex | Data indexing and retrieval for LLMs | Ontology-aware indexing for AI queries | MIT |
| LangGraph | Graph-based agent workflows | Multi-step agent reasoning over ontology | MIT |
| Dify | LLM app development platform | Reference architecture for AI integration | Apache 2.0 |
| Ollama | Local LLM serving | Run models locally for air-gapped deployments | MIT |
| vLLM | High-performance LLM serving | Production LLM inference | Apache 2.0 |
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| Keycloak | Identity and access management, SSO | Authentication (SAML, OIDC, LDAP) | Apache 2.0 |
| Casbin | Authorization library (RBAC, ABAC) | Fine-grained policy enforcement | Apache 2.0 |
| Permify | Google Zanzibar-inspired authorization | Relationship-based access control (alternative to Casbin) | Apache 2.0 |
| Open Policy Agent (OPA) | Policy engine | Policy-as-code for complex authorization rules | Apache 2.0 |
| Project | What It Provides | How Cestus Uses It | License |
|---|---|---|---|
| Kubernetes | Container orchestration | Deployment platform | Apache 2.0 |
| Helm | Kubernetes package manager | Deployment packaging | Apache 2.0 |
| ArgoCD | GitOps continuous delivery | Automated deployment from Git | Apache 2.0 |
| Prometheus + Grafana | Monitoring and observability | Platform health monitoring | Apache 2.0 |
| Project | What It Provides | Relevance |
|---|---|---|
| OpenMetadata | Unified metadata platform | Reference for metadata management and lineage |
| DataHub | Event-driven metadata management | Reference for real-time metadata sync |
| Apache Atlas | Metadata governance for Hadoop | Reference for classification and security integration |
- Palantir Technologies - Wikipedia
- What Is Palantir? - Built In
- Palantir Ontology Overview
- Palantir Ontology Architecture
- Palantir Ontology Core Concepts
- Understanding Palantir's Ontology: Semantic, Kinetic, and Dynamic Layers
- Palantir AIP Overview
- AIP Agent Studio Overview
- Palantir Apollo Platform
- Palantir Gotham Platform
- Inside Palantir: Gotham - Golding Research
- Demystifying Palantir: Features and Open Source Alternatives - Dashjoin
- 8 Best Alternatives to Palantir Foundry in 2026 - d.AP Blog
- Top 5 Alternatives to Palantir Foundry - Orchestra
- Dashjoin Platform - GitHub
- WhyHow Knowledge Graph Studio - GitHub
- Graphiti - Real-Time Knowledge Graphs - GitHub
- Zingg - Entity Resolution - GitHub
- 10 Best Open Source Graph Databases in 2026
- JanusGraph vs Neo4j Comparison
- Open Source Data Governance Frameworks Analysis
- Open Source Data Catalog: 2025 Guide
- Top Open Source ETL Frameworks in 2026
- 12 Best Open-Source Data Orchestration Tools in 2026
- 15 Best Open-Source RAG Frameworks in 2026
- Top 5 Open-Source Agentic AI Frameworks in 2026
- Best Open Source Data Visualization Tools for 2025
- Kepler.gl - Geospatial Data Visualization
- Open Visualization Foundation
- Top 10 Open Source RBAC Tools in 2026
- Apache Hop - Orchestration Platform
- Apache Beam
- Siren: Alternative to Palantir
- DataWalk: Palantir Alternative
This document is a living artifact. It represents the initial vision for Cestus and should be revised as the project evolves, the community grows, and real-world usage reveals what matters most.