|
1 | | -# zkBlob-Lake PoC v16 |
2 | | -- Parallel Blob -> Arrow |
3 | | -- Arrow uses fixed_size_binary for addr/key/value/hash |
4 | | -- Query engine switch: --engine {auto,ext,bridge} |
5 | | -- SQLite loader supports --mode {append,upsert} |
6 | | -- Update-ratio sweep to CSV |
7 | | - |
8 | | -See scripts/ for usage. Quickstart (Windows/CMD): |
9 | | - python scripts\cli.py setup |
10 | | - python scripts\cli.py bench-upd-sweep --ratios 0.1,0.25,0.5 --rows 500000 --parts 4 --workers 4 --chunk 200000 --engine bridge |
| 1 | +# HarborX — Blob Direct Write & High-Performance SQL Query |
| 2 | + |
| 3 | +**HarborX** is a high-performance data engine for Web3, supporting **direct writes from incremental blockchain Blob data** and efficient SQL querying. |
| 4 | +Its goal is to build an **end-to-end data pipeline** — from raw on-chain data to structured formats — and ultimately integrate **zero-knowledge proofs** to enable trusted computation and verification. |
| 5 | + |
| 6 | +## Key Features |
| 7 | + |
| 8 | +* **Blob Direct Write** |
| 9 | + |
| 10 | + * Write blockchain Blob data directly into structured storage formats without intermediate conversion. |
| 11 | + * Supports both batch and streaming writes to reduce latency and storage overhead. |
| 12 | + |
| 13 | +* **SQL Query Interface** |
| 14 | + |
| 15 | + * Query the latest blocks and historical data using standard SQL. |
| 16 | + * Queries can run in frontend, backend, or distributed execution environments with flexible deployment. |
| 17 | + |
| 18 | +* **High-Performance Write & Query** |
| 19 | + |
| 20 | + * Columnar storage with on-demand loading significantly reduces I/O and memory usage. |
| 21 | + * Scales with DataFusion or other distributed query engines for large-scale parallel computation. |
| 22 | + |
| 23 | +* **Verifiable Data Pipeline** *(experimental)* |
| 24 | + |
| 25 | + * Integrates zero-knowledge proofs to ensure verifiability of both data writes and query results. |
| 26 | + |
| 27 | +## Data Report |
| 28 | + |
| 29 | +We provide `report_unix.sh` / `bench_report.py` to generate performance reports, including: |
| 30 | + |
| 31 | +* Data write speed (rows/sec, MB/s) |
| 32 | +* Query latency and throughput |
| 33 | +* Storage footprint (by format) |
| 34 | + |
| 35 | +Run: |
| 36 | + |
| 37 | +```bash |
| 38 | +./scripts/report_unix.sh |
| 39 | +# or |
| 40 | +python scripts/bench_report.py |
| 41 | +``` |
| 42 | + |
| 43 | +Reports are generated in `report.md` with detailed performance metrics. |
| 44 | + |
| 45 | +## Try Online |
| 46 | + |
| 47 | +Try SQL queries instantly: |
| 48 | + |
| 49 | +🌐 [**http://play.harborx.tech/**](http://play.harborx.tech/) |
| 50 | + |
| 51 | +> The online demo uses a compact Parquet dataset and runs SQL queries entirely in the browser for quick testing. |
| 52 | +
|
| 53 | +## Common Commands |
| 54 | + |
| 55 | +1. **Generate Demo Dataset** |
| 56 | + |
| 57 | +```bash |
| 58 | +python scripts/make_demo_data.py \ |
| 59 | + --rows 8000 \ |
| 60 | + --parts 4 \ |
| 61 | + --update-ratio 0.5 \ |
| 62 | + --out-dir web/demo |
| 63 | +``` |
| 64 | + |
| 65 | +2. **Start Local Preview (Frontend-Only)** |
| 66 | + |
| 67 | +```bash |
| 68 | +python -m http.server 8000 |
| 69 | +# Visit http://127.0.0.1:8000/web/index.html |
| 70 | +``` |
| 71 | + |
| 72 | +3. **Run SQL Server (Backend Version)** |
| 73 | + |
| 74 | +```bash |
| 75 | +python scripts/sql_server.py --port 8000 |
| 76 | +# Visit http://127.0.0.1:8000/ui |
| 77 | +``` |
| 78 | + |
| 79 | +4. **Generate Performance Report** |
| 80 | + |
| 81 | +```bash |
| 82 | +./scripts/report_unix.sh |
| 83 | +# or |
| 84 | +python scripts/bench_report.py |
| 85 | +``` |
0 commit comments