Skip to content

Commit f512ad2

Browse files
committed
chore: docs
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
1 parent ecc7492 commit f512ad2

3 files changed

Lines changed: 11 additions & 5 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ Writing to GreptimeDB from Node.js? The bulk path is the fastest option by a wid
140140
| `@opentelemetry/exporter-logs-otlp-proto` | 622k r/s | 621k r/s | 0.77× |
141141
| `@influxdata/influxdb-client` | 496k r/s | 500k r/s | 0.62× |
142142

143-
Same pre-generated data, same server, same Node.js runtime; each client is driven with its own default configuration. Arrow Flight ships the batch already-columnar so the server skips text/proto parsing and per-attribute column mapping.
143+
Same schema, same data generator, same server, same Node.js runtime; each client is driven with its own default configuration. Arrow Flight ships the batch already-columnar so the server skips text/proto parsing and per-attribute column mapping.
144144

145145
On the 22-column log schema the bulk path reaches **~137k rows/s** (2M rows, batch=5000). Unary and streaming numbers, the exact SDK-usage decisions behind each bench, and reproduction commands: [docs/benchmarking.md](./docs/benchmarking.md).
146146

bench/index.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,17 @@ const forward = process.argv.slice(3);
1111

1212
if (arg === undefined) {
1313
console.error(
14-
'Usage: pnpm bench <name> [--rows=N --batch-size=N --parallelism=N --endpoint=host:port]',
14+
'Usage: pnpm bench <name> [--rows=N --batch-size=N --parallelism=N --num-hosts=N ...]',
1515
);
1616
console.error(
1717
'Available: regular-api, stream-api, bulk-api, cpu-bulk-api, cpu-influxdb, cpu-otel',
1818
);
19+
console.error(
20+
'Network flags vary by bench: gRPC benches take --endpoint=host:port; cpu-influxdb /',
21+
);
22+
console.error(
23+
'cpu-otel take --http-endpoint=URL plus --database / --user / --password. See docs/benchmarking.md.',
24+
);
1925
process.exit(2);
2026
}
2127

docs/benchmarking.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Today's gap is Arrow JS single-thread encoding (`rowsToArrowTable` = 99% of clie
6262

6363
## Apples-to-apples: vs InfluxDB JS SDK & OpenTelemetry JS SDK
6464

65-
Three benches share the CPU schema above and write the same pre-generated data through three JS clients, letting us isolate protocol/client overhead from schema effects. Ports are the GreptimeDB defaults: gRPC Bulk on `4001`, InfluxDB v2 and OTLP over HTTP on `4000`.
65+
Three benches share the CPU schema above and pre-generate datasets with the same shape and cardinality (series layout + ms-stepped timestamps; Float64 values are re-rolled per run via `Math.random()`) through three JS clients, letting us isolate protocol/client overhead from schema effects. Ports are the GreptimeDB defaults: gRPC Bulk on `4001`, InfluxDB v2 and OTLP over HTTP on `4000`.
6666

6767
- `cpu-bulk-api` — our own `@greptime/ingester`, Arrow Flight bulk path. Writes a proper time-series table: 4-tag composite PK + 5 Float64 fields + ms timestamp.
6868
- `cpu-influxdb``@influxdata/influxdb-client` v1.35, line protocol to `/v1/influxdb/api/v2/write`. GreptimeDB serves the InfluxDB v2 API natively; token is `"<user>:<password>"`. Writes the same tag/field shape; server parses LP and maps to the columnar path.
@@ -82,7 +82,7 @@ Takeaways:
8282

8383
- Arrow Flight bulk wins by a comfortable margin: ~1.3× over OTLP and ~1.6× over InfluxDB LP. The advantage is on the server side: rows arrive as a ready-made Arrow columnar batch, no parsing or per-attribute promotion required.
8484
- OTLP with `greptime_identity` pays for OTLP proto decode + per-attribute column mapping on the server, plus HTTP/1.1 framing. Still beats InfluxDB LP, which pays for text parsing on top of the same column mapping.
85-
- Row count is verified after each run via `SELECT COUNT(*)` against the per-protocol table.
85+
- Row counts were spot-checked out-of-band with `SELECT COUNT(*)` on each per-protocol table; the bench scripts themselves do not run the verification query.
8686
- Even with `greptime_identity`, the OTel and bulk tables aren't strictly identical — the OTel table still carries log-model columns (`ScopeName`, `TraceId`, etc.) and has no `TAG`-marked primary key, so per-series semantics differ. The numbers here measure ingestion throughput only, not query-path parity.
8787

8888
### SDK usage notes
@@ -120,7 +120,7 @@ pnpm bench bulk-api --rows=2000000 --batch-size=5000 --endpoint=localhost:4001
120120

121121
Available benchmark names: `regular-api`, `stream-api`, `bulk-api`, `cpu-bulk-api`, `cpu-influxdb`, `cpu-otel`. Shared flags:
122122

123-
- `--rows=N`total rows to push
123+
- `--rows=N`target row count; rounded down to a multiple of `--batch-size` (benches send whole batches only)
124124
- `--batch-size=N` — per-`write()` batch
125125
- `--parallelism=N` — concurrent in-flight RPCs (bulk / cpu-\* benches; default 8)
126126
- `--num-hosts=N``cpu-*` benches only; cardinality = `N × 5 × 10 × 20` series (default 100 → 100k series; use 1000 for the blog's 1M-series config)

0 commit comments

Comments
 (0)