-
Notifications
You must be signed in to change notification settings - Fork 95
Open
Labels
XL - Extra LargeSystem architecture overhaul, adding support for new platforms, large-scale dependency updates.System architecture overhaul, adding support for new platforms, large-scale dependency updates.
Description
Summary
Provide first-class Python bindings for Tonbo, leveraging fusio's executor abstraction to deliver native asyncio integration. Python users get the full Tonbo experience—Arrow-native, S3-ready, MVCC time travel—with idiomatic async/await syntax and zero-copy PyArrow interop.
Motivation
Why Python?
- Data community: Python dominates data science, ML, and analytics
- Agent frameworks: LangChain, AutoGPT, CrewAI are Python-first
- Adoption multiplier: Python bindings unlock 10x potential user base
- Manifesto alignment: "Agent execution substrate" needs Python support
Why Native Async?
Most Rust→Python bindings use blocking wrappers:
# Typical blocking approach (bad)
result = db.scan() # blocks Python event loop
With fusio's executor abstraction, we can implement a Python asyncio executor:
# Native async (good)
result = await db.scan() # yields to Python event loopThis enables:
- Non-blocking I/O in async Python applications
- Integration with aiohttp, FastAPI, asyncio frameworks
- Proper backpressure and cancellation
- No thread pool overhead for I/O-bound operations
Goals
- PyArrow-native: Ingest/query via pyarrow.RecordBatch with zero-copy where possible
- Native asyncio: Real async/await, not thread-pool-wrapped blocking
- Full API coverage: Transactions, snapshots, time travel, scans
- S3/Object storage: Same backends as Rust API
- pip installable: pip install tonbo with pre-built wheels
- Type hints: Full typing for IDE support
Non-Goals
- Pandas-first API (users can convert PyArrow ↔ Pandas themselves)
- SQLAlchemy/ORM integration (future consideration)
- Synchronous API (async-only for MVP; sync wrapper can come later)
Design
┌─────────────────────────────────────────────────────────────┐
│ Python User Code │
│ async with tonbo.open("s3://bucket/db") as db: │
│ await db.ingest(batch) │
│ results = await db.scan().filter(...).collect() │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ tonbo-python (PyO3) │
│ ┌──────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ PyDB │ │ PyTransaction│ │ PyScanBuilder │ │
│ │ PySnapshot │ │ PyCheckpoint │ │ PyRecordBatchStream │ │
│ └──────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ AsyncioExecutor (fusio) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ impl Executor for AsyncioExecutor │ │
│ │ - spawn() → Python asyncio.create_task() │ │
│ │ - Futures bridge Rust ↔ Python │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Tonbo Core (Rust) │
│ DB<FS, AsyncioExecutor> - unchanged │
└─────────────────────────────────────────────────────────────┘
References
- https://pyo3.rs/
- https://github.com/awestlake87/pyo3-asyncio
- https://github.com/PyO3/maturin - Build/publish wheels
- https://github.com/apache/arrow-rs/tree/master/arrow-pyarrow-integration-testing
- https://duckdb.org/docs/api/python/overview - API inspiration
- https://github.com/lancedb/lancedb - Similar project
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
XL - Extra LargeSystem architecture overhaul, adding support for new platforms, large-scale dependency updates.System architecture overhaul, adding support for new platforms, large-scale dependency updates.