Last updated: April 2026
OpenFoundry aims to deliver 25 core capabilities that match Palantir Foundry — all open-source, self-hosted, and community-driven. This roadmap outlines our phased approach to get there.
| Icon | Meaning |
|---|---|
| ✅ | Done — feature shipped and usable |
| 🚧 | In Progress — actively being built |
| 📐 | Designed — architecture defined, implementation pending |
| 🔲 | Planned — scoped but not yet started |
| 💡 | Exploring — researching approaches |
| Tag | Meaning |
|---|---|
| 🔴 Critical | Core platform value — blocks adoption |
| 🟡 High | Key differentiator — needed for production use |
| 🟠 Medium | Important — completes the platform story |
| 🟢 Low | Nice to have — enhances ecosystem |
| # | Foundry Component | OpenFoundry Service | Status | Target Phase |
|---|---|---|---|---|
| 1 | Ontology | ontology-service |
✅ Done | Phase 1 |
| 2 | Transforms / Pipeline Builder | pipeline-service |
✅ Done | Phase 1 |
| 3 | Data Connections | data-connector |
✅ Done | Phase 1 |
| 4 | Contour (Visual Analytics) | query-service |
✅ Done | Phase 1 |
| 5 | Dataset Management & Versioning | dataset-service |
✅ Done | Phase 1 |
| 6 | Data Lineage | pipeline-service/lineage |
✅ Done | Phase 1 |
| 7 | Notebooks / Code Workbooks | notebook-service |
✅ Done | Phase 1 |
| 8 | Quiver (Dashboards) | Frontend components | ✅ Done | Phase 2 |
| 9 | Object Explorer | ontology-service |
✅ Done | Phase 1 |
| 10 | Auth / RBAC / SSO | auth-service |
🚧 In Progress | Phase 2 |
| 11 | Workflows / Actions | workflow-service |
✅ Done | Phase 2 |
| 12 | Notifications | notification-alerting-service |
✅ Done | Phase 2 |
| 13 | Data Catalog | dataset-service/catalog |
✅ Done | Phase 2 |
| 14 | Data Quality | dataset-quality-service |
✅ Done | Phase 2 |
| 15 | Slate/Workshop (App Builder) | app-builder-service |
✅ Done | Phase 3 |
| 16 | ML / Model Management | ml-service |
✅ Done | Phase 3 |
| 17 | AIP (GenAI / LLM / Copilot) | ai-service |
✅ Done | Phase 3 |
| 18 | Reports | report-service |
✅ Done | Phase 4 |
| 19 | Fusion (Entity Resolution) | fusion-service |
✅ Done | Phase 4 |
| 20 | Code Repositories (Git) | code-repo-service |
✅ Done | Phase 4 |
| 21 | Marketplace | marketplace-service |
✅ Done | Phase 4 |
| 22 | Streaming (Real-time) | streaming-service |
✅ Done | Phase 4 |
| 23 | Geospatial / Maps | geospatial-service |
✅ Done | Phase 4 |
| 24 | Audit & Compliance | audit-service |
✅ Done | Phase 4 |
| 25 | Nexus (Cross-org Sharing) | nexus-service |
✅ Done | Phase 5 |
Current repo audit: 24 components are shipped. Enterprise auth remains the only partial component because OIDC is implemented, while SAML sign-in flow is still pending.
Goal: A working platform where you can connect data, explore it, build pipelines, and define an ontology.
Priority: 🔴 Critical — nothing works without this.
- Rust workspace setup — Cargo workspace, shared crates compile,
justrecipes - Protobuf generation —
bufpipeline generating Rust (tonic) and TypeScript clients - Shared libraries —
core-models,auth-middleware,event-bus,storage-abstraction - Gateway service — Axum HTTP server, gRPC-Web proxy, request ID propagation, CORS
- Auth service (basic) — JWT issue/validate, local user registration, session management
- Docker Compose dev stack — PostgreSQL, Redis, NATS, MinIO running with one command
- SvelteKit shell — App layout, sidebar, top bar, routing, auth flow, design system (base UI components)
- CI pipeline — GitHub Actions: lint (clippy + eslint), test, build, proto-check
- Dataset service — CRUD, Parquet read/write, schema management, basic versioning
- Data connectors (first wave) — PostgreSQL, MySQL, CSV, Parquet, JSON, S3, REST API
- Query service — DataFusion integration, SQL execution, result pagination, saved queries
- Frontend: Dataset Explorer — Data preview table, schema viewer, upload flow
- Frontend: SQL Workbench — Monaco SQL editor, query execution, results table
- Ontology service — Object types, properties, link types, CRUD, type validation
- Pipeline service (basic) — DAG definition, SQL transforms, sequential execution
- Data lineage — Dataset-level lineage tracking, lineage graph queries
- Frontend: Ontology Explorer — Type editor, object explorer, graph view (Cytoscape.js)
- Frontend: Pipeline Builder — DAG canvas (Svelvet), node palette, transform editor
- Frontend: Lineage View — Interactive lineage graph
- Notebook service — Notebook CRUD, cell model, session management
- Python kernel — PyO3-based Python execution, variable state, output capture
- SQL kernel — Route SQL cells to query-service
- Frontend: Notebook Editor — Cell editor (Monaco), cell outputs, kernel selector/status
Phase 1 exit criteria:
A user can connect a Postgres database, explore tables, write SQL queries, build a simple pipeline with SQL transforms, define ontology object types backed by datasets, and run Python notebooks.
Goal: Production-grade auth, dashboards, workflows, data quality, and catalog. The platform becomes usable for real teams.
Priority: 🔴 Critical + 🟠 Medium features that complete the core loop.
- RBAC — Roles, permissions, row-level security
- ABAC — Attribute-based policies
- SSO — OAuth2/OIDC provider integration, SAML (OIDC implemented; SAML sign-in flow pending)
- MFA — TOTP-based multi-factor authentication
- API keys — Programmatic access management
- Frontend: User/Role management — Settings pages for users, roles, groups
- Dashboard grid layout — Responsive drag-and-drop grid
- Chart widget — ECharts integration: bar, line, area, pie, scatter, etc.
- Table widget — Paginated, sortable, filterable data tables
- KPI widget — Single metric cards with sparklines
- Filter bar — Global filters propagated to all widgets
- Date range filter — Relative and absolute date selection
- Dashboard CRUD — Create, edit, duplicate, share dashboards
- Data catalog — Search by name/tag/owner, dataset tagging, ownership assignment
- Auto-profiling — Column statistics, distributions, null rates, uniqueness
- Quality rules — Null checks, range validation, regex, custom SQL rules
- Quality scoring — Per-dataset quality score, trend tracking
- Quality alerts — Notifications on quality degradation
- Frontend: Catalog search — Full-text search in dataset explorer
- Frontend: Quality dashboard — Quality scores, profiling report, rule management
- Workflow service — Workflow definitions, step execution, conditional branching
- Triggers — Cron, event-driven, manual, webhook triggers
- Human-in-the-loop — Approval steps, approval queue
- Notification service — Email (SMTP/SES), Slack, MS Teams webhooks
- In-app notifications — WebSocket-based real-time notifications
- User preferences — Per-user channel and frequency preferences
- Frontend: Workflow builder — Visual workflow canvas, step config, trigger config
- Frontend: Notification bell — In-app notification center
- Python transforms — PyO3-based Python transform execution
- WASM sandbox — Sandboxed WASM transforms for user-submitted code
- Column-level lineage — Track lineage at the column level through transforms
- Pipeline scheduling — Cron-based pipeline scheduling
- Retry & failure handling — Configurable retry policies, partial re-execution
- Dataset branching — Git-like branches for datasets, branch selector in UI
Phase 2 exit criteria:
Teams can collaborate with proper auth/RBAC, build dashboards over their data, set up data quality monitoring, automate workflows with approvals, and receive notifications.
Goal: ML, AI, and app building capabilities. This is where OpenFoundry becomes a true decision-making platform.
Priority: 🔴 Critical — these are the features that make Foundry Foundry.
- App builder service — App definitions, page layouts, widget catalog
- Widget system — Table, form, chart, map, text, image, button, container
- Data binding — Bind widgets to ontology objects, datasets, or queries
- Event handlers — onClick → execute action, navigate, filter, etc.
- App theming — Colors, fonts, branding customization
- Publish & deploy — Version and publish apps, embedding support (iframe)
- App templates — Starter templates for common use cases
- Frontend: WYSIWYG editor — Drag-and-drop canvas, property inspector, live preview
- Frontend: App runtime — Render published apps for end users
- Experiment tracking — Log runs with params, metrics, and artifacts
- Model registry — Register models, manage versions (staging → production)
- Feature store — Feature definitions, online serving (Redis), offline batch computation
- Training orchestration — Submit training jobs, hyperparameter tuning
- Model serving — Real-time inference endpoints, batch predictions
- A/B testing — Traffic splitting between model versions
- Drift monitoring — Data and concept drift detection, auto-retraining triggers
- Frontend: ML Studio — Experiment list, run comparison, model registry, deployment panel
- LLM gateway — Multi-provider routing (OpenAI, Anthropic, Ollama/local), load balancing, fallback
- Prompt management — Versioned prompt templates, variable interpolation
- RAG pipeline — Document chunking, embedding generation, semantic retrieval + reranking
- Knowledge bases — Index datasets and ontology into vector store (pgvector; Qdrant se retira por restricción de licencia OSS, sustituto futuro: Vespa Apache-2.0)
- AI agents — Plan → Act → Observe loop, tool calling, task decomposition
- Platform copilot — Natural language → SQL, pipeline suggestions, ontology help
- Guardrails — Output validation, PII detection, toxicity filtering
- Semantic caching — Cache LLM responses by semantic similarity
- Frontend: Copilot panel — Floating drawer, conversational UI
- Frontend: Agent builder — Visual agent configuration, tool registry
- Frontend: Knowledge manager — Upload docs, manage knowledge bases
Phase 3 exit criteria:
Users can build operational apps without code, train and deploy ML models, use AI agents and a platform copilot to accelerate their work, and build RAG pipelines over their data.
Goal: Every remaining Foundry capability. Entity resolution, streaming, geospatial, code repos, marketplace, reports, and audit.
Priority: 🟡 High — completes the platform for enterprise adoption.
- Match rules — Deterministic rules (exact, fuzzy, phonetic)
- ML-based matching — Gradient boosted classifier for probabilistic matching
- Blocking strategies — LSH, sorted neighborhood, key-based blocking
- String comparators — Jaro-Winkler, Levenshtein, Soundex, metaphone
- Graph resolution — Transitive closure for entity clusters
- Golden record — Survivorship rules, merge strategies
- Human-in-the-loop — Review queue for uncertain matches
- Frontend: Match rule builder, cluster viewer, manual review
- Stream definitions — Named streams with schemas
- Processing topology — DAG-based stream processing
- Windowing — Tumbling, sliding, and session windows
- Stream joins — Stream-stream and stream-table joins
- Complex event processing — Pattern matching on event sequences
- State backend — RocksDB-based state store
- Connectors — Kafka source, NATS source, HTTP webhook source, WebSocket sink, dataset sink
- Backpressure — Flow control to prevent overload
- Frontend: Topology editor, stream monitor, live data tail
- Report service — Report definitions, scheduled generation, distribution
- Generators — PDF (typst), Excel (rust_xlsxwriter), CSV, HTML, PPTX
- Distribution — Email, S3, Slack, webhook delivery
- Geospatial service — Spatial queries (within, intersects, nearest, buffer)
- Vector tiles — MVT tile server, H3 hex aggregation
- Geocoding — Address ↔ coordinates
- Spatial clustering — DBSCAN, K-means
- Routing — Shortest path, isochrones
- Frontend: Report designer, preview, schedule manager
- Frontend: MapLibre GL map, layer panel, heatmap, clustering, routing
- Code repo service — Git object storage (gitoxide), branches, commits
- Merge requests — Code review workflow, inline comments, approvals
- CI integration — Trigger pipeline builds on push
- Code search — Tantivy-indexed full-text code search
- Marketplace service — Package registry, versioning, dependency resolution
- Package types — Connectors, transforms, widgets, app templates, ML models, AI agents
- Discovery — Search, categories, ratings & reviews
- One-click install — Install packages into workspace
- Frontend: File browser, diff viewer, MR workflow
- Frontend: Marketplace browser, publish wizard
- Audit service — Immutable append-only audit log
- Event collection — Auto-capture from all services via NATS
- GDPR support — Right to erasure, data portability
- Compliance reports — SOC2, ISO 27001, HIPAA export formats
- Anomaly detection — Alert on unusual access patterns
- Data classification — PII, confidential, public labels
- Retention policies — Configurable TTL for audit events
- Frontend: Audit log viewer, compliance dashboard, policy manager
Phase 4 exit criteria:
The platform has full feature parity with Palantir Foundry for all 24 of 25 components, suitable for enterprise production use.
Goal: Cross-organization data sharing, plugin SDK, and community ecosystem.
Priority: 🟠 Medium — the network-effect layer.
- Peer management — Register and authenticate partner organizations
- Data sharing contracts — Define what's shared, with whom, under what terms
- Federated queries — Query shared data without copying it
- Selective replication — Replicate subsets of data to consumer orgs
- E2E encryption — Encrypted data in transit and at rest for shared datasets
- Cross-org audit trail — Audit bridge between organizations
- Schema compatibility — Validate schema compatibility across orgs
- Frontend: Peer list, share wizard, contract manager, shared data browser
- Plugin SDK — Rust + WASM SDK for building custom connectors, transforms, widgets
- CLI tool —
ofCLI for project management, deployment, and scripting - REST API docs — Full OpenAPI spec auto-generated from proto
- Developer portal — Interactive API explorer, tutorials, cookbooks
- Terraform provider — Manage OpenFoundry resources as IaC
- GitHub/GitLab integration — External Git sync, CI/CD triggers
- Frontend: Developers portal with API explorer, SDK toolkit, Terraform panel, and repository integration manager
- Distributed query execution — Multi-node DataFusion queries
- Distributed pipeline execution — Parallel transform execution across workers
- Auto-scaling — HPA/KEDA-based scaling per service
- Multi-tenancy — Logical tenant isolation, resource quotas
- Global CDN — Tile server and static asset caching at the edge
- Benchmark suite — Reproducible benchmarks for all critical paths
Phase 5 exit criteria:
Organizations can share data securely across boundaries, third-party developers can extend the platform, and the system scales to enterprise workloads.
⚠️ These are estimates, not commitments. Open source moves at the speed of contributors.
2026 Q2-Q3 Phase 1 — Foundation
████████████████████████████░░░░░░░░░░░░░░░░░░░
2026 Q3-Q4 Phase 2 — Core Platform
░░░░░░░░░░░░████████████████████████░░░░░░░░░░░
2027 Q1-Q2 Phase 3 — Intelligence (ML, AI, App Builder)
░░░░░░░░░░░░░░░░░░░░░░░░████████████████░░░░░░
2027 Q2-Q3 Phase 4 — Platform Completeness
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████████████░░
2027 Q3+ Phase 5 — Ecosystem
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████████
- User value first — Does this unlock a workflow that wasn't possible before?
- Foundation before features — Auth, data layer, and ontology must be solid before ML/AI.
- Horizontal before vertical — Basic versions of many features > perfect version of one.
- Community signal — GitHub issues with 👍 reactions influence priority.
- Contributor interest — If someone wants to build it, we help them ship it.
Every contribution accelerates the roadmap. Here's where help is most needed:
| Phase | Area | What's Needed |
|---|---|---|
| Phase 1 | Data connectors | Implement trait DataConnector for new sources |
| Phase 1 | Frontend | SvelteKit pages, Tailwind components |
| Phase 2 | Dashboard widgets | New chart types, custom widgets |
| Phase 2 | Quality rules | Custom data quality rule implementations |
| Phase 3 | LLM providers | Adapters for Gemini, Mistral, Cohere, etc. |
| Phase 4 | Geospatial | PostGIS integration, spatial algorithms |
| Phase 4 | Report generators | PDF, Excel, PPTX template engines |
| Phase 2 | Enterprise SSO | Wire SAML sign-in flow, provider validation, and end-to-end login testing |
Want to contribute? Check the issues labeled help wanted or comment on this roadmap's tracking issue.
Sixteen verifiable milestones that consolidate OpenFoundry's data plane
around the five target planes documented in
docs/architecture/runtime-topology.md
(storage, ingestion, compute, control, relational state). Each item is a
concrete, already-merged change in the monorepo, anchored to one of the
ADRs 0008–0012 in docs/architecture/adr/.
- 1. ADR-0008 — Single Iceberg REST Catalog (Lakekeeper). Decision
to standardise the lakehouse on Lakekeeper as the only Iceberg REST
catalog; tightens
infra/storage-abstraction/README.md. Seedocs/architecture/adr/ADR-0008-iceberg-rest-catalog-lakekeeper.md. - 2. ADR-0009 — Internal query fabric: DataFusion + Flight SQL.
Service-to-service SQL travels exclusively over Flight SQL P2P;
Trino is repositioned as edge BI only. See
docs/architecture/adr/ADR-0009-internal-query-fabric-datafusion-flightsql.md. - 3. ADR-0010 — CloudNativePG as the single Postgres operator.
All service-owned Postgres instances move to CNPG; HA with synchronous
replicas and barman-cloud PITR. See
docs/architecture/adr/ADR-0010-cnpg-postgres-operator.md. - 4. ADR-0011 — Control vs Data bus contract enforcement. NATS
JetStream for control, Kafka for data;
tools/bus-lint/check_bus.pyenforces the contract in CI. Seedocs/architecture/adr/ADR-0011-control-vs-data-bus-contract.md. - 5. ADR-0012 — Data-plane SLOs, SLIs and error budgets. Latency
budgets per layer (Flight SQL, Iceberg scans, Kafka acks, ClickHouse,
Vespa, NATS) with Prometheus SLIs and freeze policy. See
docs/architecture/adr/ADR-0012-data-plane-slos.md. - 6. CloudNativePG operator + cluster templates. Operator install
and nil-safe cluster template under
infra/k8s/cnpg/for service-owned Postgres provisioning aligned with ADR-0010. - 7. Lakekeeper Iceberg REST Catalog deployment. Kubernetes
manifests under
infra/k8s/lakekeeper/materialising the ADR-0008 decision;libs/storage-abstraction/README tightened accordingly. - 8. Rook Ceph: rbd-fast pool + RGW EC 4+2 object store. Storage
plane upgrade in
infra/k8s/rook/providing fast block storage and erasure-coded S3 object storage for the lakehouse. - 9. Strimzi Kafka rack/zone awareness +
RackAwareReplicaSelector. Multi-AZ resilience for the Kafka data plane ininfra/k8s/strimzi/. - 10. ClickHouse cluster scale-out (shards=2, replicas=3). Time-
series storage tier upgraded under
infra/k8s/clickhouse/with theopenfoundrycluster topology. - 11. Flink scheduled Iceberg maintenance jobs. Rewrite, expire
snapshots and orphan-file cleanup with HA + RGW checkpoints and
documented 7-day / 90-day retention under
infra/k8s/flink/. - 12. Bus-lint: control vs data bus contract. Static check in
tools/bus-lint/check_bus.pywired into CI to block cross-bus regressions implementing ADR-0011. - 13. Bus-usage audit (current/target allowlist). Audit document
docs/architecture/bus-audit.mdsplits realevent-bus-datausage into a current and target allowlist. - 14. Trino edge BI removed — superseded by item 17.
Trinowas originally repositioned as edge-BI-only; underADR-0014the Trino deployment has been removed entirely in favour of a real Apache Arrow Flight SQL server insidesql-bi-gateway-service. See item 17 below. - 15. ADR-0007 consolidation — Vespa Lite for DX. Production and
DX search both run on Vespa (Vespa Lite single-node for DX);
Meilisearch is already demoted — it is no longer part of the
default DX stack in
infra/docker-compose.yml/infra/docker-compose.dev.yml, and is gated behind the optional--profile demoas a first-run demo only. Seedocs/architecture/adr/ADR-0007-search-engine-choice.md. - 16. Chaos suite for data-plane no-SPOF properties. Smoke/chaos
scenarios under
smoke/exercising broker failover, Postgres failover and Iceberg catalog availability to assert the no-single-point-of- failure properties of the hardened data plane. - 17. ADR-0014 — Retire Trino, single Flight SQL edge gateway.
sql-bi-gateway-serviceis rewritten as a real Apache Arrow Flight SQL server (port50133) backed by DataFusion, with per-statement routing tosql-warehousing-service(Iceberg), ClickHouse, Vespa and Postgres. Auth, tenant quotas, audit and saved queries are applied uniformly on the Flight SQL surface. The previous Trino deployment underinfra/k8s/trino/and its ClickHouse catalog (infra/k8s/clickhouse/trino-catalog.yaml) are deleted. Tableau / Superset connect with the Apache Arrow Flight SQL JDBC driver. Seedocs/architecture/adr/ADR-0014-retire-trino-flight-sql-only.md.
This roadmap is a living document. It evolves with community feedback and contributions.