When the HEP writer is under very high packet rates, CPU time is dominated by decoding, DuckLake appends, and (to a lesser extent) Prometheus counter updates on the ingest path. This PR adjusts several defaults (see sections below for the new values); operators can further tune behaviour via JSON / YAML / environment variables.
When true (also the Viper default), the UDP/TCP listeners use a multi-loop / multi-reactor style setup appropriate for many-core hosts. Set to false on small VMs or when debugging single-threaded behaviour.
Environment override: HOMER_INGEST_UDP_MULTICORE, HOMER_INGEST_TCP_MULTICORE.
Rows are buffered in memory until the batch reaches this size (or the flush interval fires). Larger batches amortise catalog/append work and can lower CPU per packet at the cost of higher latency before data is visible and more RAM per table writer.
Default in Viper: 10000. Tune down for low-latency visibility; tune up (within memory limits) for sustained multi‑Gbps ingest after measuring with go tool pprof.
Environment override: HOMER_STORAGE_DUCKLAKE_BATCH_SIZE.
For sustained high PPS on an all-in-one host with 2GB+ DuckDB memory limit,
25000–50000 is a reasonable starting range (default example config uses
25000). Fewer flushes reduce DuckLake append overhead; visibility latency
grows until the batch fills or flush_interval_sec fires.
Apply on a homer-core systemd install:
sudo ./scripts/apply-stand-ducklake-batch.sh 25000
# or: sudo ./scripts/apply-stand-ducklake-batch.sh 50000Rebuild and reinstall the package after pulling ingest perf commits so the
binary matches the repo (homer --version should show the current commit).
Writer workers batch updates to Prometheus counters (homer_hep_packets_received_total, homer_hep_packets_processed_total, homer_bytes_received_total, …) so the hot path does not hit atomics on every packet.
0or omitted: use the built-in default (128 packets per flush).- Positive value: flush after that many packets (per worker, per protocol label batch). Values above 1048576 are capped at 1048576.
Raising this (for example 256–1024) can reduce CPU from Prometheus scraping/update overhead on extreme PPS; metrics become coarser between scrapes.
Environment override: HOMER_INGEST_WORKER_METRICS_FLUSH_PACKETS.
{
"ingest": {
"worker_metrics_flush_packets": 512,
"udp": { "enable": true, "multicore": true },
"tcp": { "enable": true, "multicore": true }
},
"storage": {
"ducklake": {
"batch_size": 4000
}
}
}After changes, validate with go tool pprof (homer --pprof=127.0.0.1:6060) and watch queue depth / drop counters.
From the repository root (with ./homer built, e.g. make):
# Frees default homer-check ports, starts homer with --pprof, runs UDP load, writes profile + top text under /tmp/homer-profile-ingest/
./scripts/profile_ingest_load.sh --kill-ports
# Or via Makefile (same as above)
make profile-ingestUseful overrides (environment):
| Variable | Default | Meaning |
|---|---|---|
HOMER |
$REPO_ROOT/homer |
Path to homer binary |
CONFIG |
$REPO_ROOT/homer-check.json |
Modular JSON config |
PPROF_ADDR |
127.0.0.1:6066 |
--pprof listen address |
UDP_ADDR |
127.0.0.1:19060 |
HEP UDP target for the load tool |
PPS |
12000 |
Target datagrams/sec |
PROFILE_SEC |
22 |
profile?seconds= duration |
LOAD_SEC |
24 |
How long the UDP generator runs |
OUT_DIR |
/tmp/homer-profile-ingest |
Profile .pb.gz, pprof-top.txt, homer.log |
SKIP_HOMER |
0 |
Set to 1 if homer is already running (same --pprof URL must respond) |
The load generator is go run ./cmd/hepudpload from src/ (minimal HEP3 + SIP INVITE). Manual run:
cd src && go run ./cmd/hepudpload -addr=127.0.0.1:19060 -pps=12000 -duration=30sArtifacts after the script: cpu.pb.gz, pprof-top.txt, homer.log under OUT_DIR. Interactive view: go tool pprof -http=:0 "$OUT_DIR/cpu.pb.gz".
Homer pulls DuckDB’s CGO stack through github.com/duckdb/duckdb-go/v2, which depends on the prebuilt static libs in github.com/duckdb/duckdb-go-bindings (see that repo for versioning, e.g. DuckDB v1.5.2 → module tag v0.10502.0).
By default src/go.mod contains a replace directive pointing to a fork that eliminates per-string CGO malloc/free in the Appender hot path (visible as VectorAssignStringElementLen + duckdb_free per-column in profiles). This fork has been benchmarked against upstream and showed a measurable reduction in CGO overhead at high PPS.
The current replace in src/go.mod:
replace github.com/duckdb/duckdb-go-bindings => github.com/adubovikov/duckdb-go-bindings v0.10502.0-homer.gcopt.3To revert to upstream bindings, remove this replace line and run go mod tidy. For experiments (e.g. additional optimisations), you can update the fork reference:
replace github.com/duckdb/duckdb-go-bindings => github.com/adubovikov/duckdb-go-bindings v0.10502.0-homer.gcopt.3Then go mod tidy, rebuild, and compare with ./scripts/profile_ingest_load.sh using the same PROFILE_SEC, PPS, and OUT_DIR naming. Use a warm-up (send traffic for several seconds before profile?seconds=) and ≥20–30 s profiles so runtime.cgocall / Appender rows dominate over one-off init noise.
Example A/B on one machine (same PPS/PROFILE_SEC): upstream vs fork showed ~1% difference in total CPU sample time over 10 s (within run-to-run variance); treat small deltas as inconclusive until you repeat on your hardware and workload mix.