Harden listener controls and wire backend failure telemetry by DsChauhan08 · Pull Request #5 · DsChauhan08/tunl

DsChauhan08 · 2026-04-01T12:29:16Z

Summary

strengthen listener/runtime control flow and TLS cache handling to reduce race conditions and tighten lifecycle safety
wire backend failure telemetry into metrics and audit paths for better operational visibility during incidents
update sanitizers/build scripts, docs, and integration smoke coverage to reflect and verify the hardening changes

Copilot

Pull request overview

This PR strengthens runtime/admin controls and observability by adding new admin control commands (rate/connection caps, emergency mode, audit chain verification) and wiring corresponding telemetry into Prometheus metrics and CLI/integration smoke coverage.

Changes:

Added new admin control-plane commands: SETRATE, SETMAXCONNS, SETGLOBALMAXCONNS, EMERGENCY, and AUDITVERIFY, plus listener-side connection admission checks.
Expanded Prometheus metrics for connection rejections and audit verification failures, and added backend TLS pin failure counting.
Updated test harnesses and build/CI sanitizer targets to run smoke/CLI tests against a selectable SPF_BIN.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/integration_smoke.py	Allows overriding the SPF binary via `SPF_BIN` for test portability.
tests/cli_realworld.sh	Exercises new control commands and validates new metrics/audit verification output.
src/server.cpp	Implements new control commands and enforces new connection-admission checks/telemetry.
src/metrics.c	Adds new metrics definitions and exports new counters/gauges.
src/core.c	Adds control command classification, audit chain verification, and initializes new state fields.
src/common.h	Extends state and command enums to support new controls/metrics.
spf_audit.log	Adds a sample audit log file to the repository.
README.md	Documents new admin commands and expanded metrics list.
Makefile	Adds ASAN/UBSAN build+test targets and passes `SPF_BIN` into tests.
.github/workflows/sanitizers.yml	Runs the new sanitizer test targets in CI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T12:35:32Z

+static void json_unescape_minimal(char* s) {
+    if (!s) {
+        return;
+    }
+    size_t r = 0, w = 0;
+    while (s[r] != '\0') {
+        if (s[r] == '\\' && s[r + 1] != '\0') {
+            char n = s[r + 1];
+            if (n == 'n') s[w++] = '\n';
+            else if (n == 'r') s[w++] = '\r';
+            else if (n == 't') s[w++] = '\t';
+            else s[w++] = n;
+            r += 2;
+        } else {
+            s[w++] = s[r++];
+        }
+    }
+    s[w] = '\0';
+}


spf_audit_verify_chain() calls json_unescape_minimal() on extracted JSON strings, but json_unescape_minimal() treats \uXXXX as a generic escape and strips the backslash (turning e.g. \u001b into u001b). Since audit_json_escape() can emit \u%04x for control chars, this makes the canonical string differ from what was originally hashed and will cause verification to fail for valid logs containing \u escapes. Consider either (a) not unescaping at all and reusing the already-escaped JSON field values when reconstructing canonical, or (b) implementing proper JSON unescape support for \uXXXX and then always re-escaping with audit_json_escape() before hashing.

Copilot · 2026-04-01T12:35:33Z

+int spf_audit_verify_chain(spf_state_t* state, uint32_t* entries_checked, uint32_t* failures) {
+    if (!state || !state->config.admin.audit_log_path[0]) {
+        return -1;
+    }
+
+    FILE* f = fopen(state->config.admin.audit_log_path, "r");
+    if (!f) {
+        return -1;
+    }


spf_audit_verify_chain() reads the audit log file without taking state->audit_lock, while spf_audit_log() appends under that lock. This can race with concurrent writes (partial line read / inconsistent hash chain) and produce false verification failures. Consider holding state->audit_lock for the duration of verification (or copying the file/size under lock before reading) to ensure a consistent snapshot.

Copilot · 2026-04-01T12:35:33Z

+        else if (strncmp(buf, "SETRATE ", 8) == 0) {
+            uint32_t id = 0;
+            uint64_t bps = 0;
+            if (sscanf(buf + 8, "%u %lu", &id, &bps) == 2) {
+                spf_rule_t* r = spf_get_rule(&g_state, id);
+                if (!r) {
+                    snprintf(resp, sizeof(resp), "ERR rule not found\n");
+                } else if (bps == 0 || rule_set_rate_limit(r, bps) != 0) {
+                    snprintf(resp, sizeof(resp), "ERR invalid rate\n");
+                } else {
+                    snprintf(resp, sizeof(resp), "OK rate set rule=%u bps=%lu\n", id, bps);
+                }


The SETRATE parser stores into a uint64_t bps but uses %lu in sscanf/snprintf. This is not portable across platforms where uint64_t is not unsigned long (e.g., Windows LLP64) and can truncate or misformat values. Prefer using SCNu64/PRIu64 (or %llu with an unsigned long long temp) for both parsing and formatting.

Copilot · 2026-04-01T12:35:33Z

+    pthread_mutex_lock(&g_state.stats_lock);
+    if (g_state.emergency_mode) {
+        g_state.conn_reject_emergency++;
+        pthread_mutex_unlock(&g_state.stats_lock);
+        return false;
+    }
+    if (g_state.global_max_conns > 0 && g_state.active_conns >= g_state.global_max_conns) {
+        g_state.conn_reject_global_max++;
+        pthread_mutex_unlock(&g_state.stats_lock);
+        return false;
+    }
+    pthread_mutex_unlock(&g_state.stats_lock);


try_accept_new_conn() checks g_state.active_conns against g_state.global_max_conns under stats_lock, but does not reserve/increment the count while holding the lock. With concurrent listener threads, multiple accepts can pass the check and then later increment active_conns, exceeding the configured cap. To make the cap reliable, consider reserving a slot under stats_lock (and rolling it back on subsequent dial/TLS failure).

Copilot · 2026-04-01T12:35:34Z

+    if (rule->max_conns > 0) {
+        uint32_t sum = 0;
+        for (int i = 0; i < rule->backend_count; i++) {
+            pthread_mutex_lock(&rule->backends[i].lock);
+            sum += rule->backends[i].active_conns;
+            pthread_mutex_unlock(&rule->backends[i].lock);
+        }
+        if (sum >= rule->max_conns) {


try_accept_new_conn() reads rule->max_conns and iterates rule->backend_count without holding rule->lock, while SETMAXCONNS updates rule->max_conns under that lock. This introduces an unsynchronized read/write data race. Consider taking rule->lock to snapshot max_conns/backend_count before the loop (or making these fields atomic / updated under a lock that readers also hold).

Suggested change

if (rule->max_conns > 0) {

uint32_t sum = 0;

for (int i = 0; i < rule->backend_count; i++) {

pthread_mutex_lock(&rule->backends[i].lock);

sum += rule->backends[i].active_conns;

pthread_mutex_unlock(&rule->backends[i].lock);

}

if (sum >= rule->max_conns) {

uint32_t rule_max_conns = 0;

int backend_count = 0;

pthread_mutex_lock(&rule->lock);

rule_max_conns = rule->max_conns;

backend_count = rule->backend_count;

pthread_mutex_unlock(&rule->lock);

if (rule_max_conns > 0) {

uint32_t sum = 0;

for (int i = 0; i < backend_count; i++) {

pthread_mutex_lock(&rule->backends[i].lock);

sum += rule->backends[i].active_conns;

pthread_mutex_unlock(&rule->backends[i].lock);

}

if (sum >= rule_max_conns) {

Copilot · 2026-04-01T12:35:34Z

    written += snprintf(buf + written, len - written,
        "spf_connections_active %u\n"
        "spf_connections_total %lu\n"
        "spf_bytes_in_total %lu\n"
        "spf_bytes_out_total %lu\n"
        "spf_blocked_total %lu\n"
        "spf_rules_active %u\n"
        "spf_uptime_seconds %lu\n"
        "spf_admin_auth_failures_total %lu\n"
        "spf_admin_lockouts_total %lu\n"
        "spf_admin_service_token_auth_success_total %lu\n"
        "spf_admin_service_token_auth_fail_total %lu\n"
        "spf_admin_cmd_rate_limited_total %lu\n"
        "spf_admin_temp_grants_created_total %lu\n"
        "spf_admin_failed_commands_total %lu\n"
        "spf_admin_unknown_commands_total %lu\n"
        "spf_admin_sensitive_commands_total %lu\n"
        "spf_backend_tls_handshake_failures_total %lu\n"
        "spf_backend_tls_pin_failures_total %lu\n"
-        "spf_backend_connect_timeouts_total %lu\n",
+        "spf_backend_connect_timeouts_total %lu\n"
+        "spf_conn_reject_emergency_total %lu\n"
+        "spf_conn_reject_rule_max_total %lu\n"
+        "spf_conn_reject_global_max_total %lu\n"
+        "spf_audit_verify_failures_total %lu\n"
+        "spf_global_max_conns %u\n"
+        "spf_emergency_mode %u\n",
        state->active_conns,


The metrics formatter appends new uint64_t counters using %lu (e.g., spf_conn_reject_*_total, spf_audit_verify_failures_total). On platforms where unsigned long is 32-bit, this will misreport or invoke UB. Prefer switching these new fields (and ideally the whole formatter) to PRIu64 for uint64_t values to ensure correct output across targets.

DsChauhan08 · 2026-04-01T15:28:11Z

@copilot apply changes based on the comments in this thread

…tting Agent-Logs-Url: https://github.com/DsChauhan08/tunl/sessions/c8942bdd-4e9a-4a47-86ee-51352c749efb Co-authored-by: DsChauhan08 <101635962+DsChauhan08@users.noreply.github.com>

Copilot · 2026-04-01T15:43:14Z

@copilot apply changes based on the comments in this thread

Implemented the review-thread fixes in commit 4f2a189.

I addressed the audit verification escape/locking issues, portable uint64_t formatting/parsing in SETRATE and metrics, and the listener connection-cap race/data-race concerns with reservation + rollback handling. I also validated with make -j$(nproc), make test-smoke, and make test-cli (with a fresh /tmp/spf-cli-audit.log).

new changes

14a382d

Copilot AI review requested due to automatic review settings April 1, 2026 12:29

Copilot started reviewing on behalf of DsChauhan08 April 1, 2026 12:29 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Copilot started work on behalf of DsChauhan08 April 1, 2026 15:28 View session

fix review feedback in audit verify, conn admission, and uint64 forma…

4f2a189

…tting Agent-Logs-Url: https://github.com/DsChauhan08/tunl/sessions/c8942bdd-4e9a-4a47-86ee-51352c749efb Co-authored-by: DsChauhan08 <101635962+DsChauhan08@users.noreply.github.com>

DsChauhan08 merged commit 9411fb7 into main Apr 1, 2026

Copilot finished work on behalf of DsChauhan08 April 1, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden listener controls and wire backend failure telemetry#5

Harden listener controls and wire backend failure telemetry#5
DsChauhan08 merged 2 commits intomainfrom
security-hardening-enterprise-controls

DsChauhan08 commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

DsChauhan08 commented Apr 1, 2026

Uh oh!

Copilot AI commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DsChauhan08 commented Apr 1, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

DsChauhan08 commented Apr 1, 2026

Uh oh!

Copilot AI commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants