Skip to content

Commit 6215994

Browse files
committed
chore: update README and remove unused code
1 parent 2837e4c commit 6215994

3 files changed

Lines changed: 12 additions & 112 deletions

File tree

README.md

Lines changed: 11 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# HarborX — Blob Direct Write & High-Performance SQL
22

3-
HarborX is a high-performance data engine for Web3. It ingests incremental **blob** data from L2/rollup ecosystems, writes directly into **columnar formats** (Arrow/Parquet), and exposes a **standard SQL** interface that runs either fully in the browser (DuckDB-WASM) or on your backend. The long-term goal is a verifiable pipeline with **ZK proofs** (ZKSQL) for trusted computation and cross-chain verification.
3+
HarborX is a high-performance data engine for Web3. It ingests incremental **blob** data from L2/rollup ecosystems, writes directly into **columnar formats** (Arrow/Parquet), and exposes a **standard SQL** interface. The long-term goal is a verifiable pipeline with **ZK proofs** (ZKSQL) for trusted computation and cross-chain verification.
44

55
## Key Features
66

77
- **Blob → Columnar (Direct Write)**
88

99
Pull real blob payloads and write straight to Arrow/Parquet—no heavyweight node sync or custom ETL needed. Supports small, incremental updates for low latency.
1010

11-
- **SQL Anywhere (Frontend-Only or Backend)**
11+
- **SQL Anywhere**
1212

1313
Query latest and historical data with SQL. The PoC ships a **pure-frontend** demo (DuckDB-WASM) that reads Arrow/Parquet over HTTP, no server code required.
1414

@@ -34,7 +34,7 @@ pip install -e .[cli]
3434

3535
```
3636

37-
## Option A — Use Existing Static Data (most stable for PoC)
37+
## Option A — Use Existing Static Data
3838

3939
Commit your prepared dataset under `apps/web/data/` (including `manifest.json`) and run:
4040

@@ -44,7 +44,7 @@ harborx serve --dir apps/web --port 8080
4444

4545
```
4646

47-
## Option B — Fetch a Tiny Real Dataset (Blobscan)
47+
## Option B — Fetch a Tiny Real Dataset
4848

4949
If you want to refresh the PoC data from the public API:
5050

@@ -84,9 +84,9 @@ LIMIT 50;
8484
8585
---
8686

87-
# Unified CLI
87+
# CLI
8888

89-
All PoC commands are available via the single `harborx` entrypoint:
89+
All commands are available via the single `harborx` entrypoint:
9090

9191
- **Fetch blobs + write Arrow/Parquet + manifest**
9292

@@ -96,37 +96,26 @@ All PoC commands are available via the single `harborx` entrypoint:
9696

9797
```
9898

99-
- **Serve static web demo (correct MIME types)**
99+
- **Serve static web demo**
100100

101101
```bash
102102
harborx serve --dir apps/web --port 8080
103103
104104
```
105-
106-
- **Tidy repo (plan vs apply)**
107-
108-
```bash
109-
harborx tidy
110-
harborx tidy --apply
111-
# (optional) aggressive consolidation into bench/
112-
harborx tidy --apply --aggressive
113-
114-
```
115-
116105

117106
---
118107

119-
# Frontend-Only Demo
108+
# Frontend Demo
120109

121110
- Location: `apps/web/`
122111
- Data folder: `apps/web/data/`
123112
- Manifest: `apps/web/data/manifest.json` (lists Arrow/Parquet files)
124113

125-
The app resolves file paths **relative to the manifest**. If you ever see 404s like `/data/data/...`, it means both manifest & code added the `data/` prefix. Fix either the manifest (no `data/` prefix) **or** the app’s normalization (strip `data/` if present)—don’t do both.
114+
The app resolves file paths **relative to the manifest**.
126115

127116
---
128117

129-
# Repository Layout (Recommended)
118+
# Repository Layout
130119

131120
```base
132121
harborx/ # unified CLI + data channel (blobscan, tools)
@@ -137,25 +126,8 @@ legacy/ # archived/older code and experiments
137126
138127
```
139128

140-
Generated artifacts (e.g., large datasets) are typically ignored by Git—except in PoC static mode, where you intentionally commit a tiny `apps/web/data/` for Pages.
141-
142-
---
143-
144-
# Roadmap
145-
146-
- **ZKSQL integration**: verifiable write & query proofs (SNARK-friendly schemas).
147-
- **Connectors**: more sources (e.g., rollup-specific indexers) and push-based ingest.
148-
- **Scaling knobs**: distributed query backends, tiered storage, caching layers.
149-
- **Dev-friendly packaging**: Docker + Compose for “click-to-run” deployments.
150-
151129
---
152130

153131
# License
154132

155-
MIT (PoC). See `LICENSE`.
156-
157-
---
158-
159-
**Questions / Feedback?**
160-
161-
File an issue or ping us with repro steps and logs (CLI args, browser console error, small manifest). Happy to help you get HarborX running smoothly.
133+
Apache-2.0. See `LICENSE`.
File renamed without changes.

harborx/tools.py

Lines changed: 1 addition & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -19,80 +19,8 @@ def build_manifest(root:str="apps/web", data:str="data", include_parquet:bool=Fa
1919
json.dump({"arrow":sorted(arrow), **({"parquet":sorted(parquet)} if include_parquet else {})}, fp, indent=2)
2020
print(f"[manifest] wrote {len(arrow)} arrow file(s){' and '+str(len(parquet))+' parquet file(s)' if include_parquet else ''} at {data_dir}")
2121

22-
def tidy_repo(apply:bool=False, aggressive:bool=False):
23-
plan = []
24-
25-
def mv(src, dst):
26-
if os.path.exists(src):
27-
plan.append(("move", src, dst))
28-
29-
def rm(path):
30-
if os.path.exists(path):
31-
plan.append(("remove", path))
32-
33-
# apps/web & docs
34-
mv("poc-e2e-blob-sql/packages/web-demo", "apps/web")
35-
mv("poc-e2e-blob-sql/packages/docs", "docs")
36-
37-
# bench/legacy
38-
if aggressive:
39-
mv("bench", "bench/sqlite-synthetic")
40-
mv("poc-sqlite-benchmark", "bench/sqlite-bench")
41-
else:
42-
mv("bench", "legacy/bench")
43-
mv("poc-sqlite-benchmark", "legacy/sqlite-benchmark")
44-
45-
# misc legacy buckets
46-
mv("web", "legacy/web")
47-
mv("data", "legacy/data")
48-
mv("lake", "legacy/lake")
49-
mv("poc-e2e-blob-sql/packages/ingestor", "legacy/ingestor")
50-
51-
# nested mistake cleanup
52-
nested = "poc-e2e-blob-sql/packages/web-demo/poc-e2e-blob-sql"
53-
if os.path.exists(nested):
54-
plan.append(("remove", nested))
55-
56-
print("=== Proposed repo layout ===")
57-
print("harborx/ # unified CLI + data channel")
58-
print("apps/web/ # pure-frontend demo (DuckDB-WASM)")
59-
print("docs/ # docs")
60-
print("bench/ # benchmarks (optional)")
61-
print("legacy/ # archived code")
62-
print(".gitignore # ignore data/")
63-
print("pyproject.toml # entrypoint: harborx")
64-
print()
65-
66-
for op, a, b in plan:
67-
if op == "move":
68-
print(f" - MOVE {a} -> {b}")
69-
else:
70-
print(f" - RM {a}")
71-
72-
if not apply:
73-
print("\n(dry-run) Nothing changed. To apply:\n harborx tidy --apply [--aggressive]")
74-
return
75-
76-
for op, a, b in plan:
77-
if op == "move":
78-
os.makedirs(os.path.dirname(b), exist_ok=True)
79-
if os.path.exists(b):
80-
base = os.path.basename(a.rstrip('/\\'))
81-
dst = os.path.join(b, f"_migrated_{base}")
82-
print(f" ! {b} exists, moving into {dst}")
83-
shutil.move(a, dst)
84-
else:
85-
shutil.move(a, b)
86-
elif op == "remove":
87-
if os.path.isdir(a):
88-
shutil.rmtree(a)
89-
else:
90-
os.remove(a)
91-
print("\n[tidy] repo re-organization complete.")
92-
9322
if __name__ == "__main__":
9423
ap = argparse.ArgumentParser()
9524
ap.add_argument("--apply", action="store_true")
9625
ap.add_argument("--aggressive", action="store_true")
97-
args = ap.parse_args()
98-
tidy_repo(apply=args.apply, aggressive=args.aggressive)
26+
args = ap.parse_args()

0 commit comments

Comments
 (0)