Skip to content
This repository was archived by the owner on Jan 16, 2026. It is now read-only.

Conversation

@teddyknox
Copy link
Contributor

Problem

When KONA_METRICS_PORT=0 was set, the OS would assign an available port, but there was no way to discover what port was actually assigned. The log message would show :0 instead of the actual port, making it impossible to connect to the metrics endpoint.

Solution

Replaced the metrics-exporter-prometheus built-in HTTP listener with a manually managed server. The implementation:

  1. Binds a TcpListener directly (supporting port 0)
  2. Retrieves the actual bound address via local_addr()
  3. Uses PrometheusBuilder::build_recorder() without the HTTP listener
  4. Serves metrics manually via hyper
  5. Returns the actual SocketAddr to callers

Design Rationale

The metrics-exporter-prometheus crate's with_http_listener().install() API doesn't expose the actual bound address—it only accepts a SocketAddr and returns Result<(), BuildError>. There's no way to query what port the OS assigned.

We considered two approaches:

  • Pre-bind and hope: Bind a listener to get a port, drop it, pass that port to the builder. This has a race condition where another process could grab the port.
  • Manual serving: Build the recorder without HTTP, bind our own listener, serve manually. No race condition.

We chose manual serving, which is the same pattern used by reth's metrics infrastructure. This approach:

  • Eliminates the race condition entirely (same listener is used throughout)
  • Gives full control over the HTTP server
  • Allows returning the actual bound address

Breaking Changes

  • init_prometheus_server() is now async and takes SocketAddr instead of (IpAddr, u16)
  • MetricsArgs::init_metrics() is now async
  • Metrics initialization must now happen inside a tokio runtime

Call sites in kona-node and kona-supervisor have been updated to initialize metrics inside their async blocks.

Usage

// Get the actual address (useful when port is 0)
if let Some(addr) = args.init_metrics_with_addr().await? {
    println!("Metrics available at http://{}", addr);
}

// Or just initialize without needing the address
args.init_metrics().await?;

Change `init_prometheus_server` to manually bind and serve metrics,
allowing port 0 to be used for OS-assigned port allocation. The actual
bound address is now returned and logged.

This follows the same pattern used by reth, where the recorder is built
without the built-in HTTP server and metrics are served manually via hyper.

BREAKING CHANGE: `init_metrics()` and `init_prometheus_server()` are now
async and must be called from within a tokio runtime.
Copilot AI review requested due to automatic review settings January 9, 2026 21:28
@wiz-b4c72f16a4
Copy link

wiz-b4c72f16a4 bot commented Jan 9, 2026

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities 4 Low
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings -
Software Supply Chain Finding Software Supply Chain Findings -
Total 4 Low

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for dynamic port assignment (port 0) for the Prometheus metrics server by replacing the built-in HTTP listener from metrics-exporter-prometheus with a manually managed hyper server. This allows the actual bound port to be discovered and returned to callers, which was previously impossible with the library's API.

Key changes:

  • Manual metrics serving via hyper instead of the built-in HTTP listener
  • Async initialization to support tokio-based TcpListener binding
  • Returns the actual bound SocketAddr to support dynamic port assignment

Reviewed changes

Copilot reviewed 5 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/utilities/cli/src/prometheus.rs Implements manual metrics serving with hyper, replaces sync init with async, adds comprehensive design documentation
crates/utilities/cli/src/error.rs Adds PrometheusError type for better error handling of binding and recorder initialization
crates/utilities/cli/src/flags/metrics.rs Updates MetricsArgs methods to be async and adds init_metrics_with_addr() to return actual bound address
crates/utilities/cli/src/lib.rs Exports PrometheusError for public API
crates/utilities/cli/Cargo.toml Adds hyper, hyper-util, tokio dependencies for manual metrics serving
bin/supervisor/src/cli.rs Moves metrics initialization inside tokio runtime (async block)
bin/node/src/flags/metrics.rs Updates init_unified_metrics to be async
bin/node/src/cli.rs Moves metrics initialization inside tokio runtime for async commands
Cargo.lock Updates dependency graph with new direct dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +34
hyper = { version = "1.6", features = ["server", "http1"] }
hyper-util = { version = "0.1", features = ["tokio"] }
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding hyper and hyper-util to the workspace dependencies in the root Cargo.toml. This would ensure consistent versions across the workspace and make it easier to manage updates. Currently, these dependencies are specified with explicit versions only in this crate, while hyper is already transitively used in the project (version 1.8.1). Using workspace dependencies would follow the pattern used for other shared dependencies like tokio, http, and http-body-util.

Suggested change
hyper = { version = "1.6", features = ["server", "http1"] }
hyper-util = { version = "0.1", features = ["tokio"] }
hyper = { workspace = true, features = ["server", "http1"] }
hyper-util = { workspace = true, features = ["tokio"] }

Copilot uses AI. Check for mistakes.
let response = Response::builder()
.header(CONTENT_TYPE, "text/plain; charset=utf-8")
.body(Full::new(Bytes::from(metrics)))
.unwrap();
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Response::builder().body().unwrap() call should handle errors properly instead of using unwrap(). While it's unlikely to fail in this specific case (the inputs are valid), using unwrap() can cause the entire task to panic if something unexpected happens. Consider using expect() with a descriptive message or properly handling the error case.

Suggested change
.unwrap();
.expect("failed to build HTTP response for Prometheus metrics");

Copilot uses AI. Check for mistakes.
@op-will
Copy link
Collaborator

op-will commented Jan 9, 2026

Closes #2987

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 0% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.9%. Comparing base (5a4457f) to head (d5f92da).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
crates/utilities/cli/src/prometheus.rs 0.0% 37 Missing ⚠️
crates/utilities/cli/src/flags/metrics.rs 0.0% 7 Missing ⚠️
bin/supervisor/src/cli.rs 0.0% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (5a4457f) and HEAD (d5f92da). Click for more details.

HEAD has 20 uploads less than BASE
Flag BASE (5a4457f) HEAD (d5f92da)
proof 7 0
unit 2 1
e2e 12 0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants