Skip to content

Commit 4ddbde2

Browse files
committed
docs: add detailed README with
1 parent bcf8879 commit 4ddbde2

3 files changed

Lines changed: 95 additions & 19 deletions

File tree

README.md

Lines changed: 85 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,85 @@
1-
# zkBlob-Lake PoC v16
2-
- Parallel Blob -> Arrow
3-
- Arrow uses fixed_size_binary for addr/key/value/hash
4-
- Query engine switch: --engine {auto,ext,bridge}
5-
- SQLite loader supports --mode {append,upsert}
6-
- Update-ratio sweep to CSV
7-
8-
See scripts/ for usage. Quickstart (Windows/CMD):
9-
python scripts\cli.py setup
10-
python scripts\cli.py bench-upd-sweep --ratios 0.1,0.25,0.5 --rows 500000 --parts 4 --workers 4 --chunk 200000 --engine bridge
1+
# HarborX — Blob Direct Write & High-Performance SQL Query
2+
3+
**HarborX** is a high-performance data engine for Web3, supporting **direct writes from incremental blockchain Blob data** and efficient SQL querying.
4+
Its goal is to build an **end-to-end data pipeline** — from raw on-chain data to structured formats — and ultimately integrate **zero-knowledge proofs** to enable trusted computation and verification.
5+
6+
## Key Features
7+
8+
* **Blob Direct Write**
9+
10+
* Write blockchain Blob data directly into structured storage formats without intermediate conversion.
11+
* Supports both batch and streaming writes to reduce latency and storage overhead.
12+
13+
* **SQL Query Interface**
14+
15+
* Query the latest blocks and historical data using standard SQL.
16+
* Queries can run in frontend, backend, or distributed execution environments with flexible deployment.
17+
18+
* **High-Performance Write & Query**
19+
20+
* Columnar storage with on-demand loading significantly reduces I/O and memory usage.
21+
* Scales with DataFusion or other distributed query engines for large-scale parallel computation.
22+
23+
* **Verifiable Data Pipeline** *(experimental)*
24+
25+
* Integrates zero-knowledge proofs to ensure verifiability of both data writes and query results.
26+
27+
## Data Report
28+
29+
We provide `report_unix.sh` / `bench_report.py` to generate performance reports, including:
30+
31+
* Data write speed (rows/sec, MB/s)
32+
* Query latency and throughput
33+
* Storage footprint (by format)
34+
35+
Run:
36+
37+
```bash
38+
./scripts/report_unix.sh
39+
# or
40+
python scripts/bench_report.py
41+
```
42+
43+
Reports are generated in `report.md` with detailed performance metrics.
44+
45+
## Try Online
46+
47+
Try SQL queries instantly:
48+
49+
🌐 [**http://play.harborx.tech/**](http://play.harborx.tech/)
50+
51+
> The online demo uses a compact Parquet dataset and runs SQL queries entirely in the browser for quick testing.
52+
53+
## Common Commands
54+
55+
1. **Generate Demo Dataset**
56+
57+
```bash
58+
python scripts/make_demo_data.py \
59+
--rows 8000 \
60+
--parts 4 \
61+
--update-ratio 0.5 \
62+
--out-dir web/demo
63+
```
64+
65+
2. **Start Local Preview (Frontend-Only)**
66+
67+
```bash
68+
python -m http.server 8000
69+
# Visit http://127.0.0.1:8000/web/index.html
70+
```
71+
72+
3. **Run SQL Server (Backend Version)**
73+
74+
```bash
75+
python scripts/sql_server.py --port 8000
76+
# Visit http://127.0.0.1:8000/ui
77+
```
78+
79+
4. **Generate Performance Report**
80+
81+
```bash
82+
./scripts/report_unix.sh
83+
# or
84+
python scripts/bench_report.py
85+
```

bench_upd_sweep.csv

Lines changed: 0 additions & 4 deletions
This file was deleted.

requirements.txt

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
1-
pyarrow>=14.0.1
2-
datafusion>=38.0.0
3-
duckdb>=1.0.0
1+
# Core dependencies
2+
pyarrow>=14.0.1,<15.0.0
3+
datafusion>=38.0.0,<39.0.0
4+
5+
# Optional: for backend DuckDB usage
6+
duckdb>=1.0.0,<2.0.0
7+
8+
# Utilities
49
python-dotenv>=1.0.1
5-
fastapi
6-
uvicorn[standard]
10+
fastapi>=0.103.0
11+
uvicorn[standard]>=0.23.0

0 commit comments

Comments
 (0)