Skip to content

Epic: multi-table catalog with public APIs #559

@ethe

Description

@ethe

Summary

Expose public APIs for multi-table management built on the existing internal CatalogCodec. Enable users to create, list, and manage multiple tables within a single Tonbo instance, each with its own schema.

Why

Agent data naturally separates into distinct schemas:

Data Type Schema Example
Trajectory (run_id, step_id, timestamp, action, observation, reasoning)
State (run_id, checkpoint_name, state_blob, created_at)
Artifact (artifact_id, run_id, content_type, data, hash)
Telemetry (timestamp, span_id, trace_id, level, message, attributes)

Current state:

  • Internal CatalogCodec exists in manifest
  • Per-table TableId and manifest entries work
  • register_table() with schema validation exists
  • No public APIs exposed

Without multi-table support, users must either:

  1. Encode all data types into one wide schema (awkward, inefficient)
  2. Run separate Tonbo instances per data type (resource duplication)
  3. Use different storage systems for different data types

What

Public API Surface

  • Create table: register a new table with schema and primary key definition
  • List tables: enumerate tables in the catalog
  • Get table metadata: retrieve schema, stats, version info
  • Drop table: remove a table and schedule its data for GC
  • Open table handle: obtain a table-specific read/write interface

Resource Sharing

  • Shared executor, I/O handles, and caches across tables
  • Per-table WAL and SST namespacing
  • Per-table manifest entries within unified catalog

Open Questions

Area Question Options
Transactions Cross-table transaction support? Single-table only / Multi-table atomic / Saga pattern
Consistency Catalog DDL atomicity? Async eventual / Synchronous with manifest CAS
Namespace Table identification? String name / UUID / Hierarchical path
Schema Cross-table references? None / Soft references / Foreign keys
Isolation Failure blast radius? Shared-fate / Per-table isolation
WAL Per-table WAL or shared? Separate files / Tagged entries in shared WAL
Compaction Scheduling scope? Global queue / Per-table budgets
GC Cross-table dependencies? Independent / Coordinated watermarks
Quotas Resource limits? None / Per-table storage/throughput caps
Versioning Snapshot scope? Per-table / Cross-table consistent snapshot

Key Trade-off

Single-table transactions          Multi-table transactions
        ↓                                   ↓
  Simpler manifest                  Complex coordination
  Per-table CAS                     Cross-table CAS or 2PC
  Independent GC                    Coordinated watermarks
  Easier to implement               Agent workflows may need this

Success Criteria

  • Design decisions documented for open questions above
  • Public Catalog API for table lifecycle (create, list, get, drop)
  • DB::open_table(name) returns table-scoped handle
  • Multiple tables share single TonboManifest instance
  • Per-table SST namespacing
  • Integration tests with multiple concurrent tables

Non-Goals

  • Cross-table joins (push to query engine layer)
  • Fine-grained access control (future work)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    L - LargeIntroducing a new module, implementing major features, adjusting system architecture.enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions