You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same pre-generated data, same server, same Node.js runtime; each client is driven with its own default configuration. Arrow Flight ships the batch already-columnar so the server skips text/proto parsing and per-attribute column mapping.
143
+
Same schema, same data generator, same server, same Node.js runtime; each client is driven with its own default configuration. Arrow Flight ships the batch already-columnar so the server skips text/proto parsing and per-attribute column mapping.
144
144
145
145
On the 22-column log schema the bulk path reaches **~137k rows/s** (2M rows, batch=5000). Unary and streaming numbers, the exact SDK-usage decisions behind each bench, and reproduction commands: [docs/benchmarking.md](./docs/benchmarking.md).
Copy file name to clipboardExpand all lines: docs/benchmarking.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,7 @@ Today's gap is Arrow JS single-thread encoding (`rowsToArrowTable` = 99% of clie
62
62
63
63
## Apples-to-apples: vs InfluxDB JS SDK & OpenTelemetry JS SDK
64
64
65
-
Three benches share the CPU schema above and write the same pre-generated data through three JS clients, letting us isolate protocol/client overhead from schema effects. Ports are the GreptimeDB defaults: gRPC Bulk on `4001`, InfluxDB v2 and OTLP over HTTP on `4000`.
65
+
Three benches share the CPU schema above and pre-generate datasets with the same shape and cardinality (series layout + ms-stepped timestamps; Float64 values are re-rolled per run via `Math.random()`) through three JS clients, letting us isolate protocol/client overhead from schema effects. Ports are the GreptimeDB defaults: gRPC Bulk on `4001`, InfluxDB v2 and OTLP over HTTP on `4000`.
66
66
67
67
-`cpu-bulk-api` — our own `@greptime/ingester`, Arrow Flight bulk path. Writes a proper time-series table: 4-tag composite PK + 5 Float64 fields + ms timestamp.
68
68
-`cpu-influxdb` — `@influxdata/influxdb-client` v1.35, line protocol to `/v1/influxdb/api/v2/write`. GreptimeDB serves the InfluxDB v2 API natively; token is `"<user>:<password>"`. Writes the same tag/field shape; server parses LP and maps to the columnar path.
@@ -82,7 +82,7 @@ Takeaways:
82
82
83
83
- Arrow Flight bulk wins by a comfortable margin: ~1.3× over OTLP and ~1.6× over InfluxDB LP. The advantage is on the server side: rows arrive as a ready-made Arrow columnar batch, no parsing or per-attribute promotion required.
84
84
- OTLP with `greptime_identity` pays for OTLP proto decode + per-attribute column mapping on the server, plus HTTP/1.1 framing. Still beats InfluxDB LP, which pays for text parsing on top of the same column mapping.
85
-
- Row count is verified after each run via `SELECT COUNT(*)`against the per-protocol table.
85
+
- Row counts were spot-checked out-of-band with `SELECT COUNT(*)`on each per-protocol table; the bench scripts themselves do not run the verification query.
86
86
- Even with `greptime_identity`, the OTel and bulk tables aren't strictly identical — the OTel table still carries log-model columns (`ScopeName`, `TraceId`, etc.) and has no `TAG`-marked primary key, so per-series semantics differ. The numbers here measure ingestion throughput only, not query-path parity.
0 commit comments