[KLC-2387] validator CPU preflight + benchmark crypto category#43
[KLC-2387] validator CPU preflight + benchmark crypto category#43fbsobreira wants to merge 6 commits into
Conversation
Adds a startup-time SHA-256 throughput gate for validator nodes plus a crypto benchmark category in klever-benchmark. The preflight refuses to start a validator measuring below 800 MB/s on a 200 ms self-bench (the floor below which leader-mode TX processing exceeds the 425 ms protocol hardware-tolerance window). Operators control enforcement via preferences.enforceCpuPreflight (defaults to true, including for existing configs that omit the key); KLEVER_SKIP_CPU_CHECK=1 is an emergency bypass. The standalone klever-benchmark gains a Crypto category (SHA-256 1K/16K, Blake2b, Keccak-256, Ed25519 verify) with CPU feature attestation and a hard grade-F veto below 500 MB/s SHA-256. JSON output exposes vetoed and veto_reason for fleet tooling. Field validation across the validator fleet confirms the SHA-NI gap: non-SHA-NI guests measure 244-283 MB/s while EPYC hosts measure 1474-1674 MB/s, matching the ~5x ratio that motivated the gate.
WalkthroughThis PR adds a crypto benchmarking suite to the benchmark CLI (SHA‑256/Blake2b/Keccak MB/s and Ed25519 verify ops/sec, CPU feature flags), integrates Crypto into scoring with a SHA‑256 hard veto, wires a ChangesBenchmark Crypto Suite Addition
Validator CPU Preflight Gate
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
🚥 Pre-merge checks | ✅ 5 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces a CPU performance gate for validator startup and expands the klever-benchmark tool with a new Crypto/Hashing category, aiming to prevent validators from running on hosts that are too slow at SHA-256 to meet consensus timing constraints.
Changes:
- Add validator startup CPU preflight based on a short SHA-256 throughput self-benchmark, with enforcement controlled by
preferences.enforceCpuPreflightand an env-var bypass. - Add a Crypto/Hashing benchmark category (SHA-256 small/large, Blake2b, Keccak-256, Ed25519 verify) plus CPU feature attestation, and incorporate results into scoring + JSON/text reports.
- Add/adjust operator-facing documentation for the preflight and benchmark CLI output/grade semantics.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| config/prefsConfig.go | Adds EnforceCPUPreflight *bool and ShouldEnforceCPUPreflight() to default enforcement to true when YAML key is absent. |
| config/node/config.yaml | Documents and enables enforceCpuPreflight in the sample node config. |
| cmd/node/startup.go | Runs the CPU preflight early in startup and conditionally blocks/warns based on preferences. |
| cmd/node/PREFLIGHT.md | New documentation explaining validator CPU preflight rationale, behavior, and overrides. |
| cmd/node/preflight.go | New preflight implementation: CPU feature detection + SHA-256 throughput bench + gating logic. |
| cmd/node/preflight_test.go | Adds tests for preflight branching and a smoke test for the bench function. |
| cmd/benchmark/score.go | Adds Crypto category scoring, rebalances weights, and introduces a SHA-256 throughput-based veto cap. |
| cmd/benchmark/score_test.go | New tests for weight totals, crypto weight, and throughput veto behavior. |
| cmd/benchmark/runner.go | Adds SkipCrypto option and runs the crypto benchmark when enabled. |
| cmd/benchmark/report.go | Adds crypto verdicting, text + JSON report output for crypto metrics/features, and score veto fields. |
| cmd/benchmark/main.go | Adds -skip-crypto CLI flag wiring into the runner config. |
| cmd/benchmark/crypto.go | New crypto benchmark implementation (hash throughput + Ed25519 verify + CPU features). |
| cmd/benchmark/CLI.md | Updates CLI documentation to include crypto category, weights, and the SHA-256 hard-veto explanation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/benchmark/crypto.go`:
- Around line 151-163: The hot loop is allocating each iteration via h.Sum(nil);
preallocate a reusable slice with capacity h.Size() (or max digest length)
before the loops and call h.Sum(prealloc[:0]) inside the inner loop so Sum
reuses that buffer instead of allocating; allocate the buffer once (e.g., out :=
make([]byte, 0, newHash().Size())) before deadline/start and replace h.Sum(nil)
with h.Sum(out[:0]) while keeping h.Reset(), h.Write(buf) and the bytes
accounting unchanged.
In `@cmd/benchmark/score.go`:
- Around line 48-55: Add a clarifying comment next to the minLeaderSHA256MBps
constant that references the other SHA-256 threshold (minSHA256ThroughputMBps in
preflight.go) and explains their distinct roles: minLeaderSHA256MBps (500.0) is
a benchmark scoring veto / characterization floor, while minSHA256ThroughputMBps
(800) is a stricter preflight validator startup requirement; include a short
rationale for the different values (softer gate vs. hard startup gate) and a
pointer to preflight.go for maintainers to find the stricter threshold.
In `@cmd/node/PREFLIGHT.md`:
- Around line 3-5: The sentence in PREFLIGHT.md that reads "The validator binary
runs a CPU preflight check at startup, immediately after loading the BLS signing
key." is incorrect; update that wording to reflect the actual startup order by
changing "after loading the BLS signing key" to "before loading the BLS signing
key" (or similar phrasing) so the doc matches the implementation and preserves
the intended security rationale about preflight running prior to key material
being loaded.
- Around line 106-107: Update the phrase "same SHA-NI veto" to explicitly state
the benchmark uses a throughput-based veto (measured SHA-256 MB/s) rather than a
CPU feature-bit check; reference the operator-facing doc "cmd/benchmark/CLI.md"
and change the sentence to clarify that the benchmark gates on measured SHA-256
throughput and will veto even on CPUs that report SHA-NI if measured MB/s falls
below the threshold, so operators won't mistake feature-bit presence for
adequate performance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 47edf1b0-3e5b-46b5-b81f-5dccf8537b98
📒 Files selected for processing (13)
cmd/benchmark/CLI.mdcmd/benchmark/crypto.gocmd/benchmark/main.gocmd/benchmark/report.gocmd/benchmark/runner.gocmd/benchmark/score.gocmd/benchmark/score_test.gocmd/node/PREFLIGHT.mdcmd/node/preflight.gocmd/node/preflight_test.gocmd/node/startup.goconfig/node/config.yamlconfig/prefsConfig.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (1)
**/*_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)
Files:
cmd/benchmark/score_test.gocmd/node/preflight_test.go
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.
Applied to files:
cmd/node/startup.gocmd/benchmark/runner.gocmd/benchmark/main.goconfig/prefsConfig.gocmd/benchmark/score_test.gocmd/node/preflight.gocmd/benchmark/crypto.gocmd/benchmark/score.gocmd/node/preflight_test.gocmd/benchmark/report.go
🔇 Additional comments (16)
config/prefsConfig.go (1)
10-31: Default-on preflight semantics are correctly encoded.Using
*boolplusShouldEnforceCPUPreflight()cleanly distinguishes omitted config from explicitfalseand keeps the default safe.config/node/config.yaml (1)
63-73: Config documentation is clear and operationally actionable.The comments explain threshold, default behavior, and emergency override in a way operators can apply directly.
cmd/node/startup.go (1)
252-261: Preflight placement and enforcement branching look correct.Running the gate before key loading and downgrading only when
enforceCpuPreflight=falseis the right startup control flow.cmd/benchmark/runner.go (1)
106-114: Crypto benchmark integration is consistent with the existing runner pattern.Skip gating, stderr progress output, wrapped errors, and result assignment follow the established flow cleanly.
cmd/benchmark/main.go (1)
34-35:--skip-cryptoflag wiring is complete and correct.The new flag is declared, parsed, and propagated into
Configas expected.Also applies to: 100-101
cmd/benchmark/CLI.md (1)
20-21: CLI and scoring docs for crypto category are clear and complete.The new flag, category weight, veto behavior, and JSON fields are documented in a practical operator-focused way.
Also applies to: 64-104
cmd/benchmark/score_test.go (1)
1-181: LGTM!The test file demonstrates good practices:
- Thread-safe test fixtures with no shared mutable state between tests
- No hardcoded sleeps for synchronization
- Table-driven tests used appropriately (
TestComputeScore_RebalancedWeights,TestScoreGrade_Boundaries)- Good coverage of edge cases: nil results, empty results, veto boundaries, and category weights
- The
excellentResults()helper cleanly isolates test fixturescmd/benchmark/score.go (2)
159-204: LGTM — scoring logic and veto implementation are correct.The
ComputeScorefunction:
- Correctly aggregates all seven categories including the new Crypto category
- Applies the SHA-256 veto after computing the score, ensuring
Totalstill reflects actual performance whileGradeis capped to "F"- The veto fires on measured throughput rather than CPU flags, matching the documented design rationale
277-288: LGTM —cryptoCatScorefollows established patterns.The implementation mirrors other category score functions with proper nil handling and uses
mean()to weight all five crypto metrics equally.cmd/node/preflight.go (2)
134-164: LGTM — benchmark implementation is sound.
benchSHA256:
- Amortizes
time.Now()syscall overhead by checking deadline only once per 256 iterations- Falls back to deterministic buffer content if
crypto/rand.Readfails, ensuring the benchmark remains representative- Correctly handles edge cases (non-positive duration returns 0)
- The inner-loop batch approach means the final measurement may slightly exceed the deadline, but this is standard practice and produces consistent results
76-124: LGTM — preflight logic is well-structured.The implementation:
- Uses dependency injection for CPU info and bench function, enabling full test coverage without mocking globals
- Runs the benchmark twice and takes the maximum, mitigating false negatives from transient throttling (thermal, noisy neighbor)
- Logs feature flags as informational even when they don't hard-fail, giving operators diagnostic context
- Produces actionable error messages with migration guidance and override instructions
cmd/node/preflight_test.go (2)
14-66: LGTM — thread-safe test logger implementation.
recordingLogger:
- Uses
sync.Mutexcorrectly for all read/write operationshasInfoandhasWarnhelpers hold the lock for the entire iteration, preventing races if a future test invokes logging concurrently
74-244: LGTM — comprehensive test coverage.The tests cover:
- Happy paths for both
amd64andarm64architectures- Feature-warning paths (missing AVX-512 IFMA, missing SHA-NI)
- Throughput enforcement (slow bench → error, fast bench → pass with warn)
- Environment bypass (
KLEVER_SKIP_CPU_CHECK=1)- Unsupported architecture handling (no-op)
- Edge cases for
benchSHA256(non-positive duration)- Production wiring smoke test
The use of
t.Setenvensures proper test isolation, andfastBenchenables deterministic testing without timing dependencies.cmd/benchmark/report.go (3)
268-287: LGTM —cryptoVerdictfollows established patterns.The function correctly checks all five crypto metrics against their respective fail/pass thresholds in the expected order (fail conditions first, then warn conditions).
731-745: LGTM — JSON schema additions are consistent.The
jsonScorestruct correctly includes:
Vetoed boolwithoutomitempty— always present, providing explicit status in JSON outputVetoedReason stringwithomitempty— only present when there's a reason to reportThis design ensures consumers always know the veto status without checking for field existence.
907-938: LGTM — crypto results and score correctly wired into JSON output.The crypto result population includes all measurement fields and CPU feature flags, and the score section correctly includes the crypto category alongside the veto fields.
- Preallocate digest buffer in benchSHA256 / benchHashMBps to remove per-iteration allocations from the hash hot loop (verified: 2039 allocs/op -> 0 allocs/op). Improves measurement stability on hosts where GC pressure could mask SHA-NI throughput. - Update PREFLIGHT.md: preflight runs immediately *before* loading the BLS key (not after), matching the implementation. - Fix doc drift: BenchmarkScore.Vetoed and CryptoResult.HasSHA_NI now correctly describe throughput-based gating rather than SHA-NI feature flag. HasSHA_NI doc clarifies the cross-platform shorthand (SHA-NI on amd64, ARMv8 SHA2 on arm64). - Document the relationship between cryptoSHA256LargeFailMBps (600, per-metric verdict) and minLeaderSHA256MBps (500, hard grade veto). - Surface preferences.enforceCpuPreflight=false escape hatch in the preflight error message so operators discover the warn-only mode without needing to find the docs first. - go mod tidy: promote klauspost/cpuid/v2 from indirect to direct dep since cmd/node/preflight.go imports it directly.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/benchmark/crypto.go`:
- Around line 65-76: Rename the misleading CryptoResult field HasSHA_NI to an
architecture-neutral name (e.g. HasSHA256Accel or HasSHAAccel) in the
CryptoResult type and update every usage site to the new identifier
(constructors, serializers, JSON tags, CLI output, tests, and any consumer code)
so the CLI/JSON no longer exposes the x86-specific term; ensure the struct
field's JSON tag (if present) and any documentation/comments are updated to the
new name to keep external API and docs consistent.
In `@cmd/benchmark/score.go`:
- Around line 48-54: The comment for minLeaderSHA256MBps is stale and claims it
“matches the existing fail floor for SHA-256 16 KiB blocks” while the actual
threshold constant cryptoSHA256LargeFailMBps is 600 MB/s in report.go; update
the comment in the minLeaderSHA256MBps block to reflect the true relationship
(either change the 500 MB/s wording to note the 600 MB/s fail floor or remove
the “matches” sentence) and ensure the rationale cites cryptoSHA256LargeFailMBps
and the calibrated numbers consistently (refer to minLeaderSHA256MBps and
cryptoSHA256LargeFailMBps when editing).
In `@cmd/node/preflight.go`:
- Around line 141-147: The preflight benchmark currently calls
crypto/rand.Read(buf) which can block; remove that call and always initialize
the buffer deterministically (e.g., fill buf with a repeating or index-based
pattern such as buf[i] = byte(i)) so the SHA-256 throughput measurement uses
stable, non-blocking input; update the code around the buf variable and the
existing fallback loop to perform the deterministic fill unconditionally and
remove the crypto/rand dependency in this setup.
In `@cmd/node/PREFLIGHT.md`:
- Around line 97-102: Change the absolutist phrasing that "CCX (dedicated AMD
EPYC) and CPX (shared AMD EPYC) instances always satisfy the preflight" to a
softer, observational statement (e.g., "have been observed to satisfy the
preflight") and likewise clarify that the CX series may include Skylake-class
instances that do not; keep the existing concrete guidance to run the
klever-benchmark command (klever-benchmark --skip-disk --skip-network --skip-kv
--skip-memory --skip-goroutine --skip-bignum) as the definitive check before
deploying a validator, and update the surrounding text for consistency so
operators rely on the benchmark rather than fixed SKUs (references: CCX, CPX,
CX, Skylake-class, and klever-benchmark).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 54329bbc-0ea6-479d-ac38-03fbaea65380
📒 Files selected for processing (6)
cmd/benchmark/crypto.gocmd/benchmark/report.gocmd/benchmark/score.gocmd/node/PREFLIGHT.mdcmd/node/preflight.gogo.mod
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: Analyze (go)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.
Applied to files:
cmd/node/preflight.gocmd/benchmark/crypto.gocmd/benchmark/score.gocmd/benchmark/report.go
🔇 Additional comments (3)
cmd/node/preflight.go (1)
81-126: Preflight gating flow is clean and operationally actionable.The bypass/unsupported-arch/measure/fail branches are well partitioned, and the failure message includes concrete operator escape hatches.
cmd/node/PREFLIGHT.md (1)
3-84: Doc-to-implementation alignment looks solid.Startup ordering, enforcement modes, override behavior, and threshold semantics are clearly documented and consistent with the code path.
cmd/benchmark/crypto.go (1)
157-168: Good call reusing the digest buffer in the hot loop.This removes per-iteration allocations from the benchmark path, so the throughput numbers are much less exposed to GC noise.
Sonar flagged the printf template " %-32s %7.1f MB/s %s (pass≥%.0f, fail<%.0f MB/s)\n" as duplicated 5 times across printNetworkSection and printCryptoSection. Centralised as metricThroughputMBpsRowFmt so future column-width or label-format changes stay consistent across sections.
Sonar flagged the new safe-default accessor as uncovered. The function is load-bearing for the rollout safety guarantee (absent YAML key must default to enforce=true so upgrading operators are not silently downgraded to warn-only), so the absent-key branch is the most important one to lock down. Test covers all three states: nil pointer, explicit *true, explicit *false.
…docs - preflight.go: replace crypto/rand seed with deterministic init. rand.Read can block on entropy starvation during early boot, which could delay validator startup before preflight even runs. SHA-256 amd64/arm64 fast paths are data-independent so the buffer contents do not affect measured throughput. - score.go: stale comment claimed minLeaderSHA256MBps "matches the existing fail floor"; actual fail floor is 600 MB/s vs 500 veto. Document the intentional gap and explain why behavior stays consistent (category-fail path in gradeToVerdict). - PREFLIGHT.md: soften absolute "always satisfy" claim about Hetzner CCX/CPX SKUs to "typically" + emphasise running the benchmark to confirm. Cloud SKU hardware can change over time.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/benchmark/report.go`:
- Around line 372-375: The CPU feature reporting currently hardcodes the label
and field for SHA-NI (using results.CryptoResult and c.HasSHA_NI), which is
incorrect on arm64; change the report and serialization to be architecture-aware
by checking SystemInfo.GOARCH (or equivalent) and selecting the appropriate
label/field name (e.g., "SHA-NI" for amd64 vs "ARMv8 SHA2" for arm64) before
calling yesNo(c.HasSHA_NI) (or the corresponding bit/flag on CryptoResult);
update all occurrences (the printf block around results.CryptoResult, and the
similar blocks at the other listed ranges) so the printed label and exported
JSON key match the selected arch-neutral or arch-specific name.
In `@cmd/benchmark/score.go`:
- Around line 194-208: The veto message currently always suggests "missing
SHA-NI on amd64"; update the SHA-256 throughput veto in ComputeScore (the block
that checks r.CryptoResult and c.SHA256LargeMBps < minLeaderSHA256MBps and sets
s.VetoedReason) to branch on the host architecture available on the
BenchmarkResults (the r value) and produce an architecture-appropriate
remediation string: for amd64 keep the existing "missing SHA-NI" wording, for
arm64 mention missing ARM SHA extensions or use a generic "missing SHA-256
acceleration / hardware support" message, while preserving the formatted
throughput numbers (c.SHA256LargeMBps and minLeaderSHA256MBps) and still setting
s.Vetoed = true and s.Grade = "F".
In `@cmd/node/preflight.go`:
- Around line 96-100: The preflight logs and error messages use the
arch-specific label "sha_ni" and remediation text tied to AMD64 even when
info.arch may be arm64; update logging and error text to be architecture-aware
or arch-neutral: replace the hardcoded "sha_ni" label with an arch-neutral key
(e.g., "sha256_accel") or compute the label from info.arch, and when emitting
messages or errors that reference missing SHA support, derive the remediation
string from info.arch (e.g., mention "SHA-NI on amd64" vs "ARMv8 SHA2 on arm64")
so both log.Info calls (the one using info.hasSHA and the other similar blocks
later) and any failure/error messages reference the correct feature for
info.arch.
In `@cmd/node/PREFLIGHT.md`:
- Around line 3-5: The current text overstates the guarantee by claiming a host
that fails the preflight "never unlocks the BLS key" without qualifying
enforcement; update the wording in PREFLIGHT.md to scope the guarantee to the
enforced default path (e.g., mention that the key is not loaded when the CPU
preflight is enabled/enforced), and reference the configuration options
preferences.enforceCpuPreflight and environment variable KLEVER_SKIP_CPU_CHECK
to note that disabling these allows startup to continue; ensure the sentence
around the preflight description and the key-loading guarantee explicitly states
"when CPU preflight is enforced (default)" or similar to match implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f09a930b-a26b-43dc-9a68-e77e25213e8f
📒 Files selected for processing (5)
cmd/benchmark/report.gocmd/benchmark/score.gocmd/node/PREFLIGHT.mdcmd/node/preflight.goconfig/prefsConfig_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (1)
**/*_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)
Files:
config/prefsConfig_test.go
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.
Applied to files:
config/prefsConfig_test.gocmd/benchmark/score.gocmd/node/preflight.gocmd/benchmark/report.go
🔇 Additional comments (3)
config/prefsConfig_test.go (3)
1-4: LGTM!Package declaration and imports are appropriate for a unit test file.
5-15: LGTM!The nil pointer test correctly validates the critical safe-default behavior. When operators upgrade and their existing YAML lacks the new
EnforceCPUPreflightkey, the system will enforce the CPU preflight by default, preventing silent security/performance downgrades.
17-31: LGTM!Both explicit true/false cases are correctly tested. The local variable pattern (
v := true; &v) is a standard Go idiom for creating bool pointers. Test coverage is complete for this boolean pointer accessor.
CodeRabbit's third pass surfaced a real correctness issue: text report, JSON output, log keys, and remediation messages all hardcoded "SHA-NI" — which is the Intel/AMD x86 brand. On arm64 the equivalent ISA is ARMv8 SHA2, so an operator running on a slow arm64 validator was being misdirected to migrate from SHA-NI-deficient x86 hardware when the actual root cause is missing ARMv8 SHA2. Rename CryptoResult.HasSHA_NI -> HasSHAAccel; JSON tag sha_ni -> sha_accel; preflight log key sha_ni -> sha_accel. Add shaAccelName(arch) helper that returns "SHA-NI" / "ARMv8 SHA2" / generic per architecture and use it in: text CPU section, post-crypto warning, JSON surface, score VetoedReason, preflight measurement log, preflight Warn line, preflight failure error. Qualify the PREFLIGHT.md key-loading guarantee to scope it to the enforced default path, since enforceCpuPreflight=false and KLEVER_SKIP_CPU_CHECK=1 deliberately allow startup past failure. Verified end-to-end: - amd64 Skylake host: text shows "SHA-NI=no", JSON has "sha_accel": false, veto_reason cites "missing SHA-NI on amd64 (Skylake-X / Cascade Lake / Haswell)". - arm64 (M4 Max) local: text shows "ARMv8 SHA2=yes" instead of the previous misleading "SHA-NI=yes".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/benchmark/crypto.go`:
- Around line 223-237: The ed25519 verification loop repeatedly calls
time.Now().Before(deadline) for every op causing excessive syscall overhead on
fast hosts; modify the loop in the benchmark function that performs
ed25519.Verify (the block using variables deadline, start, count and calling
ed25519.Verify(pub, msg, sig)) to use an inner batch loop: choose a small batch
size (e.g., 64 or 256), run that many ed25519.Verify calls incrementing count,
then check time.Now().After(deadline) to break; ensure you stop mid-batch if
deadline passed and compute elapsed and ops/sec the same way so semantics and
final count remain correct.
In `@cmd/benchmark/score_test.go`:
- Around line 98-137: Add a boundary unit test that sets
r.CryptoResult.SHA256LargeMBps = minLeaderSHA256MBps and calls ComputeScore to
assert the throughput veto does not trigger; specifically create a
TestComputeScore_ThroughputVeto_ExactBoundary test (analogous to the existing
tests) that verifies s.Vetoed is false (and optionally that s.Grade != "F") to
confirm the comparison is strict (<) rather than <= at the minLeaderSHA256MBps
boundary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cb10c2e6-9aec-42c5-aeb1-2f60c07c8a07
📒 Files selected for processing (6)
cmd/benchmark/crypto.gocmd/benchmark/report.gocmd/benchmark/score.gocmd/benchmark/score_test.gocmd/node/PREFLIGHT.mdcmd/node/preflight.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (1)
**/*_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)
Files:
cmd/benchmark/score_test.go
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.
Applied to files:
cmd/benchmark/score_test.gocmd/benchmark/score.gocmd/benchmark/crypto.gocmd/node/preflight.gocmd/benchmark/report.go
🔇 Additional comments (11)
cmd/benchmark/crypto.go (2)
1-113: LGTM — crypto benchmark entry point and result structure are well-designed.The
RunCryptoBenchmarkfunction properly sequences sub-benchmarks, handles errors, and returns a completeCryptoResult. The architecture-awareshaAccelNameandhasSHAAccelerationhelpers correctly distinguish SHA-NI (amd64) from ARMv8 SHA2 (arm64).
180-206: LGTM — hash benchmark loop is well-optimized.The preallocation of
digest(line 188) eliminates per-iteration allocations, and theinnerLoopbatching (256 iterations before deadline check) minimizestime.Now()syscall overhead. Therand.Readfallback for buffer seeding is appropriate here since this is an operator-run tool, not the boot-path preflight.cmd/node/preflight.go (2)
64-128: LGTM — preflight logic is robust and well-documented.The test seam via
validatorCPUPreflightWithInfoenables thorough testing without mocking globals. Running the benchmark twice and taking the maximum handles transient throttling. The arch-aware error messages (shaAccelName,shaAccelArchSuffix) correctly guide operators on both amd64 and arm64.
167-198: LGTM — benchSHA256 is correctly implemented for the startup path.Deterministic buffer initialization (line 176-178) avoids
crypto/randentropy starvation during early boot. TheinnerLoopbatching and deadline-based termination match the benchmark tool's approach. The preallocation ofdigest(line 180) eliminates GC pressure during measurement.cmd/benchmark/score_test.go (1)
162-181: LGTM — grade boundary tests are thorough.Table-driven approach cleanly covers all grade thresholds (S/A/B/C/D/F). Test isolation is good — each case is independent.
cmd/benchmark/score.go (2)
167-214: LGTM — scoring with crypto category and hard veto is correctly implemented.The veto logic properly:
- Checks
CryptoResult != nilbefore accessing fields- Uses strict
<comparison against the 500 MB/s floor- Sets both
VetoedandVetoedReasonwith arch-aware messaging viashaAccelName(r.SystemInfo.GOARCH)- Forces
Grade = "F"regardless of point totalThe category weight sum is correct (200+200+200+150+100+100+50 = 1000).
287-298: LGTM — crypto category score normalizer.The
cryptoCatScorefunction correctly averages five normalized metrics (SHA-256 small/large, Blake2b, Keccak-256, Ed25519 verify). Consistent with other category normalizers in this file.cmd/node/PREFLIGHT.md (1)
1-118: LGTM — comprehensive and accurate operator documentation.The document properly addresses:
- Key-loading guarantee scoped to enforced mode (lines 4-10)
- Measured-throughput veto vs feature-bit check distinction (lines 27-31, 115-117)
- Threshold relationship: startup 800 MB/s vs benchmark 500 MB/s (lines 85-90)
- Cloud SKU guidance qualified as "based on current fleet observations" (lines 103-110)
- Emergency bypass semantics with fail-closed typo protection (lines 72-73)
cmd/benchmark/report.go (3)
372-376: LGTM — CPU feature header is now architecture-aware.The text report correctly uses
shaAccelName(si.GOARCH)to display "SHA-NI" on amd64 and "ARMv8 SHA2" on arm64, addressing the previous review concern about hardcoded x86 terminology.
574-609: LGTM — crypto section with arch-aware remediation.The
printCryptoSectionfunction:
- Renders all five metrics with consistent formatting via
metricThroughputMBpsRowFmt- Shows SHA acceleration warning only on supported architectures (
amd64orarm64)- Uses
shaAccelName(runtime.GOARCH)for the correct feature name per platform
817-936: LGTM — JSON output correctly exposes crypto results and veto fields.The
jsonCryptoandjsonCPUFeaturesstructs use appropriate field names (sha_accelinstead ofsha_ni). The score object includesvetoedandveto_reasonfields for programmatic consumption. All crypto metrics are exposed with clear naming conventions (sha256_1k_mbps,sha256_16k_mbps, etc.).
| deadline := time.Now().Add(cryptoBenchDuration) | ||
| start := time.Now() | ||
| var count int64 | ||
| for time.Now().Before(deadline) { | ||
| if !ed25519.Verify(pub, msg, sig) { | ||
| return 0, fmt.Errorf("ed25519 self-verify failed unexpectedly") | ||
| } | ||
| count++ | ||
| } | ||
| elapsed := time.Since(start).Seconds() | ||
| if elapsed <= 0 { | ||
| return 0, nil | ||
| } | ||
| return float64(count) / elapsed, nil | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Ed25519 verify loop lacks batching — potential time.Now() overhead on fast hosts.
Unlike benchHashMBps, this loop calls time.Now().Before(deadline) on every single verification. On a host achieving 25K ops/s, that's 50K+ syscalls over 2 seconds. While the impact is less severe than in hashing (verify is ~40µs vs ~300ns per hash), adding an inner batch loop would improve measurement accuracy on high-end hardware.
♻️ Optional: batch the deadline check
func benchEd25519Verify() (float64, error) {
...
deadline := time.Now().Add(cryptoBenchDuration)
start := time.Now()
var count int64
+ const innerLoop = 64
for time.Now().Before(deadline) {
- if !ed25519.Verify(pub, msg, sig) {
- return 0, fmt.Errorf("ed25519 self-verify failed unexpectedly")
+ for i := 0; i < innerLoop; i++ {
+ if !ed25519.Verify(pub, msg, sig) {
+ return 0, fmt.Errorf("ed25519 self-verify failed unexpectedly")
+ }
}
- count++
+ count += innerLoop
}
...
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| deadline := time.Now().Add(cryptoBenchDuration) | |
| start := time.Now() | |
| var count int64 | |
| for time.Now().Before(deadline) { | |
| if !ed25519.Verify(pub, msg, sig) { | |
| return 0, fmt.Errorf("ed25519 self-verify failed unexpectedly") | |
| } | |
| count++ | |
| } | |
| elapsed := time.Since(start).Seconds() | |
| if elapsed <= 0 { | |
| return 0, nil | |
| } | |
| return float64(count) / elapsed, nil | |
| } | |
| deadline := time.Now().Add(cryptoBenchDuration) | |
| start := time.Now() | |
| var count int64 | |
| const innerLoop = 64 | |
| for time.Now().Before(deadline) { | |
| for i := 0; i < innerLoop; i++ { | |
| if !ed25519.Verify(pub, msg, sig) { | |
| return 0, fmt.Errorf("ed25519 self-verify failed unexpectedly") | |
| } | |
| } | |
| count += innerLoop | |
| } | |
| elapsed := time.Since(start).Seconds() | |
| if elapsed <= 0 { | |
| return 0, nil | |
| } | |
| return float64(count) / elapsed, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/benchmark/crypto.go` around lines 223 - 237, The ed25519 verification
loop repeatedly calls time.Now().Before(deadline) for every op causing excessive
syscall overhead on fast hosts; modify the loop in the benchmark function that
performs ed25519.Verify (the block using variables deadline, start, count and
calling ed25519.Verify(pub, msg, sig)) to use an inner batch loop: choose a
small batch size (e.g., 64 or 256), run that many ed25519.Verify calls
incrementing count, then check time.Now().After(deadline) to break; ensure you
stop mid-batch if deadline passed and compute elapsed and ops/sec the same way
so semantics and final count remain correct.
| func TestComputeScore_ThroughputVeto_CapsGradeAtF(t *testing.T) { | ||
| r := excellentResults() | ||
| // Bench measured below the leader-mode floor (e.g., a Skylake/Haswell | ||
| // without SHA-NI typically lands around 250 MB/s). | ||
| r.CryptoResult.SHA256LargeMBps = 250 | ||
|
|
||
| s := ComputeScore(r) | ||
| if !s.Vetoed { | ||
| t.Fatal("expected Vetoed=true when SHA-256 16K throughput below the floor") | ||
| } | ||
| if s.Grade != "F" { | ||
| t.Fatalf("Grade = %q, want F when veto applies", s.Grade) | ||
| } | ||
| if s.VetoedReason == "" { | ||
| t.Fatal("expected VetoedReason to be populated") | ||
| } | ||
| // The numeric score should still be substantial — the veto is a grade-cap, | ||
| // not a silent zero. Operators get to see how the rest of the system performs. | ||
| // (The Crypto category itself will score low because of the bad throughput, | ||
| // but the other six categories were set to excellent in this fixture.) | ||
| if s.Total < 700 { | ||
| t.Fatalf("Total = %d, expected non-veto categories to still score normally", s.Total) | ||
| } | ||
| } | ||
|
|
||
| func TestComputeScore_ThroughputVeto_DoesNotApply_AboveFloor(t *testing.T) { | ||
| r := excellentResults() | ||
| // Throughput just above the floor — veto must not trigger even though | ||
| // the host could in principle be a non-SHA-NI amd64. | ||
| r.CryptoResult.SHA256LargeMBps = minLeaderSHA256MBps + 1 | ||
| r.CryptoResult.HasSHAAccel = false | ||
|
|
||
| s := ComputeScore(r) | ||
| if s.Vetoed { | ||
| t.Fatalf("Vetoed must be false when throughput is above floor; reason: %s", s.VetoedReason) | ||
| } | ||
| if s.Grade == "F" { | ||
| t.Fatalf("Grade = F unexpectedly when throughput is above floor") | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider adding a test case at the exact veto boundary.
The tests cover 250 (below) and minLeaderSHA256MBps + 1 (above), but not the exact boundary value 500.0. Floating-point boundary conditions can be subtle — a test at exactly 500.0 would confirm whether < vs <= is intended.
♻️ Optional: add boundary test
func TestComputeScore_ThroughputVeto_ExactBoundary(t *testing.T) {
r := excellentResults()
r.CryptoResult.SHA256LargeMBps = minLeaderSHA256MBps // exactly 500.0
s := ComputeScore(r)
if s.Vetoed {
t.Fatal("Vetoed must be false at exact boundary (< is used, not <=)")
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/benchmark/score_test.go` around lines 98 - 137, Add a boundary unit test
that sets r.CryptoResult.SHA256LargeMBps = minLeaderSHA256MBps and calls
ComputeScore to assert the throughput veto does not trigger; specifically create
a TestComputeScore_ThroughputVeto_ExactBoundary test (analogous to the existing
tests) that verifies s.Vetoed is false (and optionally that s.Grade != "F") to
confirm the comparison is strict (<) rather than <= at the minLeaderSHA256MBps
boundary.
| func shaCommonCauseSuffix(arch string) string { | ||
| switch arch { | ||
| case "amd64": | ||
| return " on amd64 (Skylake-X / Cascade Lake / Haswell)" |
There was a problem hiding this comment.
every amd64 is one of thoose 3?
i would replace with something like "Likely a x,y,z"
Summary
cmd/node/preflight.go): refuses to start validators on hosts that fail a 200 ms SHA-256 self-bench at < 800 MB/s — the floor below which leader-mode TX processing exceeds the protocol's 425 ms hardware-tolerance window.cmd/benchmark/crypto.go): SHA-256 (1 KiB / 16 KiB), Blake2b, Keccak-256, Ed25519 verify, plus CPU feature attestation (SHA-NI / AVX-512 IFMA / VAES / GFNI). Hard grade-F veto below 500 MB/s SHA-256.preferences.enforceCpuPreflight(defaulttrue, including when omitted from existing configs);KLEVER_SKIP_CPU_CHECK=1for emergency bypass.cmd/node/PREFLIGHT.md(rationale, behavior, override, migration plan) + updates tocmd/benchmark/CLI.md.Why
A field investigation across the validator fleet found a ~5× spread in smart-contract TX processing time between SHA-NI-equipped and SHA-NI-deficient hosts. Skylake-class Intel Xeon guests took ~600 ms per representative SC TX vs ~120 ms on AMD EPYC peers — well above the 425 ms protocol hardware-tolerance lower bound. The preflight catches this at startup so operators discover the issue before consensus-time outliers manifest in production.
The gate is grounded in measured SHA-256 throughput, not the SHA-NI feature flag — it correctly handles nested virtualisation that hides feature bits while still exposing the underlying ISA, and it does not assume SHA-NI absence is the sole cause of low throughput.
Behavior
enforceCpuPreflight: true(default)enforceCpuPreflight: falseKLEVER_SKIP_CPU_CHECK=1The bench is run twice and the maximum is reported, so a single transient throttle event (thermal, hypervisor noisy neighbor) does not falsely refuse a host that would otherwise pass.
Field validation
Cross-compiled the standalone benchmark to amd64 Linux and ran it across 5 fleet hosts:
94.130.96.24351.195.253.15634.176.51.11365.21.181.9891.99.66.173Real-world SHA-NI ratio: ~5.5× faster on EPYC vs Intel guests without SHA-NI — matches the field-investigation findings.
Test plan
go test ./cmd/node/... ./cmd/benchmark/...— all passing (12 preflight tests + 9 score tests)go test -race ./cmd/node ./cmd/benchmark— clean for changed packagesgo vet ./cmd/node/... ./cmd/benchmark/... ./config/...— cleango build ./cmd/node/... ./cmd/benchmark/... ./config/...— cleanvetoed/veto_reasonfields, verified end-to-end*boolsemantics treat absent YAML key astrue(enforce by default) so upgrading operators are not silently downgraded to warn-onlyOut of scope / follow-ups
gopsutilCGO macOS-SDK deprecation warning is unrelated.This PR introduces an operator-facing validator CPU preflight at node startup and a new crypto category in the klever-benchmark. Both are operational/diagnostic and do not change consensus rules, transaction processing, state management, KVM, or networking logic.
Blockchain Impact
Key components affected
Validator CPU Preflight
Crypto Benchmark & Scoring
Configuration & operator guidance
Testing & quality
Summary judgement