Implement token authentication #118

lalinsky · 2025-09-13T16:23:13Z

Add support for token-based authentication with both static tokens and dynamic token handlers. Authentication tokens are included in the CONNECT protocol message and authentication failures are properly detected and reported.

Features:

ConnectionOptions.token for static token authentication
ConnectionOptions.token_handler for dynamic token generation
Enhanced error detection for authentication failures in -ERR responses
Test infrastructure with token-authenticated NATS server

Add support for token-based authentication with both static tokens and dynamic token handlers. Authentication tokens are included in the CONNECT protocol message and authentication failures are properly detected and reported. Features: - ConnectionOptions.token for static token authentication - ConnectionOptions.token_handler for dynamic token generation - Enhanced error detection for authentication failures in -ERR responses - Test infrastructure with token-authenticated NATS server

coderabbitai · 2025-09-13T16:23:18Z

Walkthrough

Adds token-based authentication support to the client (ConnectionOptions.token / token_handler), includes resolved auth_token in the CONNECT handshake, and maps server -ERR messages to a new ProtocolError type. Adds tests for token auth, updates test utilities, and introduces a token-auth NATS service in docker-compose for tests.

Changes

Cohort / File(s)	Summary of changes
Connection auth & protocol errors `src/connection.zig`	New public `ProtocolError` and added to `ConnectionError`; `ConnectionOptions` gains `token` and `token_handler`; handshake resolves token (handler overrides static) and embeds `auth_token` in CONNECT; refactors error flow (uses `errdefer self.close()`, replaces some `catch` with `try`) and adds `parseProtocolError` mapping `-ERR` texts to `ProtocolError` variants; server `-ERR` handling logs and surfaces mapped protocol errors.
Public re-export `src/root.zig`	Re-exports `ProtocolError` (`pub const ProtocolError = @import("connection.zig").ProtocolError;`).
Tests: auth suite `tests/auth_test.zig`, `tests/all_tests.zig`	Adds `tests/auth_test.zig` with tests for: static token success, token_handler success, handler precedence over static token, invalid token failure (timeout), and no-auth failure; wires `auth_test.zig` into `tests/all_tests.zig`.
Test utilities `tests/utils.zig`	Adds `Node.token_auth = 14225` and shifts `Node.unknown` to `14226` so test harness targets the new auth service port.
Test infra: docker-compose `docker-compose.test.yml`	Adds `nats-token-auth` service using `nats:2.10-alpine` with `--auth test_token_123`, host networking, configured ports (server 14225, http 18225), debug and server_name, and a healthcheck against `http://127.0.0.1:18225/healthz`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add support for no_responders option #17 — Also modifies ConnectionOptions and CONNECT payload construction; overlaps in handshake/auth logic.
Add client details to CONNECT #21 — Changes CONNECT/handshake payload (client metadata/JSON), touching the same code paths.
Enable NATS protocol version 1 support #48 — Alters the CONNECT payload/protocol field; intersects with this PR’s CONNECT message changes.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "Implement token authentication" is concise, single-sentence, and directly describes the primary change (adding token-based auth and token_handler support), so it accurately reflects the changeset and is appropriate for history scanning.
Description Check	✅ Passed	The description explicitly lists the new features (ConnectionOptions.token, token_handler, CONNECT payload inclusion, enhanced -ERR handling) and mentions the added tests, which matches the file-level summaries, so it is on-topic and passes the lenient description check.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch token-authentication

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/utils.zig (1)

84-86: Wait for all 4 services to be healthy to avoid auth test flakiness.

We now run 4 services (including nats-token-auth). Waiting for only 3 can race the auth tests. Bump the threshold.

Apply:
-        if (healthy_count >= 3) {
+        if (healthy_count >= 4) {
             return;
         }

src/connection.zig (1)

1314-1339: Broaden auth error detection to be case-insensitive and include common variants.

Current checks miss messages like “Authentication Timeout” (capital A). Make matching robust.

Apply:

-            if (std.mem.containsAtLeast(u8, err_msg, 1, "Authorization Violation") or
-                std.mem.containsAtLeast(u8, err_msg, 1, "authorization violation") or
-                std.mem.containsAtLeast(u8, err_msg, 1, "authentication"))
+            if (std.mem.containsAtLeast(u8, err_msg, 1, "Authorization Violation") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "authorization violation") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "Authorization Timeout") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "authentication") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "Authentication"))
             {
                 self.handshake_error = ConnectionError.AuthFailed;
             } else {
                 self.handshake_error = ConnectionError.ConnectionFailed;
             }

🧹 Nitpick comments (4)

docker-compose.test.yml (1)

81-100: Service definition for token-auth is good; consider pinning image to a patch tag.

nats:2.10-alpine can drift under you. For reproducible CI, pin to a known-good patch (e.g., 2.10.x-alpine) that you already validate locally.
src/connection.zig (2)
197-200: Public API: token and token_handler

API shape is reasonable. Two small improvements to consider:

Accept an error-able token handler: ?*const fn () anyerror![]const u8 to allow retrieval failures.

If you expect handlers needing state, add an optional token_handler_ctx: ?*anyopaque and a signature fn (*anyopaque) []const u8.

1208-1213: Avoid emitting "auth_token": null — set emit_null_optional_fields = false

Confirmed: std.json.StringifyOptions exposes emit_null_optional_fields (bool).

File: src/connection.zig lines 1208-1213 (also apply at 1224-1235)

Apply:
-        try std.json.stringify(connect_obj, .{}, buffer.writer());
+        try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());
tests/auth_test.zig (1)
64-83: Tighten expected error once auth classification is broadened.

After making auth detection case-insensitive, this should deterministically be AuthFailed.

Apply:
-        // Accept either AuthFailed or ConnectionFailed for now
-        try std.testing.expect(err == nats.ConnectionError.AuthFailed or err == nats.ConnectionError.ConnectionFailed);
+        try std.testing.expect(err == nats.ConnectionError.AuthFailed);

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d097f9e and 501ec19.

📒 Files selected for processing (5)

docker-compose.test.yml (1 hunks)
src/connection.zig (7 hunks)
tests/all_tests.zig (1 hunks)
tests/auth_test.zig (1 hunks)
tests/utils.zig (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-25T14:51:33.570Z

Learnt from: lalinsky
PR: lalinsky/nats.zig#31
File: tests/jetstream_pull_test.zig:21-23
Timestamp: 2025-08-25T14:51:33.570Z
Learning: The nats.zig project uses a "tests:beforeEach" test in tests/all_tests.zig that runs before each test to clean up all JetStream streams and consumers, ensuring test isolation. This eliminates the need for unique test resource names as each test starts with a clean JetStream state.

Applied to files:

tests/all_tests.zig
tests/auth_test.zig

📚 Learning: 2025-08-30T18:48:28.333Z

Learnt from: lalinsky
PR: lalinsky/nats.zig#66
File: src/connection.zig:317-323
Timestamp: 2025-08-30T18:48:28.333Z
Learning: In src/connection.zig, the establishConnection() call in connectToServer() is during initial connection setup where no race conditions exist - the connection starts disconnected and other threads would see the connecting status and wait appropriately.

Applied to files:

src/connection.zig

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test

🔇 Additional comments (7)

tests/utils.zig (1)

10-11: Enum addition looks correct and consistent with docker-compose.

token_auth = 14225 matches the new service port, and bumping unknown avoids collision. No issues.

src/connection.zig (1)

378-409: Good: automatic cleanup on initial connect failures.

errdefer self.close() plus waiting for handshake under the mutex is solid and matches the prior “no races during initial connect” guidance.

tests/all_tests.zig (2)

8-8: Adding auth tests to the suite: good.

In combination with bumping the healthy-count to 4, this should be stable.

40-41: Minor: align health wait with 4 services.

Once utils waits for 4 “healthy”, this stays as-is; otherwise, consider increasing the timeout slightly if CI is slow.

tests/auth_test.zig (3)

7-20: Static token success path looks correct.

Covers the handshake and basic I/O; good smoke test.

42-62: Precedence test reads well.

Validates handler > static token, which matches the client behavior.

86-99: No-auth against auth server

This is the right negative test and will remain stable with the CONNECT changes above.

Implement detailed error categorization for all NATS server -ERR messages matching the nats.go client approach. This allows client code to handle specific protocol errors instead of generic connection failures. Features: - ProtocolError enum with 15+ specific error types for authentication, limits, protocol violations, and connection issues - parseProtocolError() function to map -ERR messages to specific errors - Enhanced processErr() to propagate specific errors to client code - Updated tests to expect AuthorizationViolation instead of generic AuthFailed - Export ProtocolError for client access Client code can now distinguish between different failure types: - AuthorizationViolation vs AuthExpired vs AuthRevoked - MaxConnectionsExceeded vs SecureConnectionRequired - PermissionViolation vs MaxPayloadViolation - UnknownServerError fallback for unrecognized errors

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

src/root.zig (1)

21-21: Add a brief doc comment and changelog entry for new public API

Document ProtocolError at the root to help users discover it and note the addition in the release notes.

src/connection.zig (4)

121-148: Cover a few common NATS -ERR variants as distinct ProtocolError cases

Consider adding specific variants for messages that appear frequently in the wild to improve diagnostics and branching:

authorization timeout
permissions violation for publish/subscription (more precise than a generic permissions violation)
parser error

Apply:

 pub const ProtocolError = error{
   // Authentication/Authorization errors
   AuthorizationViolation, // "Authorization Violation"
+  AuthorizationTimeout,   // "Authorization Timeout"
   AuthExpired,            // "User Authentication Expired"
   AuthRevoked,            // "User Authentication Revoked"
   AccountAuthExpired,     // "Account Authentication Expired"
-  PermissionViolation,    // "Permissions Violation"
+  PermissionViolation,    // generic fallback
+  PermissionViolationPublish,   // "Permissions Violation for Publish"
+  PermissionViolationSubscription, // "Permissions Violation for Subscription"
   // Connection/Limit errors
   MaxConnectionsExceeded, // "maximum connections exceeded"
   ConnectionThrottling,   // "Connection throttling is active"
   MaxPayloadViolation,    // "Maximum Payload Violation"
   MaxSubscriptionsExceeded, // "maximum subscriptions exceeded"
   // Protocol errors
   SecureConnectionRequired, // "Secure Connection - TLS Required"
   InvalidClientProtocol,  // "invalid client protocol"
+  ParserError,            // "Parser Error"
   UnknownProtocolOperation, // "Unknown Protocol Operation"
   InvalidPublishSubject,  // "Invalid Publish Subject"
   NoRespondersRequiresHeaders, // "no responders requires headers support"
   // Account errors
   FailedAccountRegistration, // "Failed Account Registration"
   // Generic fallback
   UnknownServerError,     // For unrecognized -ERR messages
 };

And extend the matcher:

- } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation")) {
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation for publish")) {
+     return ProtocolError.PermissionViolationPublish;
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation for subscription")) {
+     return ProtocolError.PermissionViolationSubscription;
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation")) {
      return ProtocolError.PermissionViolation;
   }
+ // ...
+ else if (std.mem.containsAtLeast(u8, lower_err, 1, "authorization timeout")) {
+     return ProtocolError.AuthorizationTimeout;
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "parser error")) {
+     return ProtocolError.ParserError;
+ }

226-229: Token handler API lacks error and allocation semantics

The zero-arg, infallible token_handler returning a borrowed slice constrains real-world use (e.g., fetching/refreshing tokens, handling failures) and is ambiguous about ownership. Recommend making it fallible and allocator-aware.

Apply:

 pub const ConnectionOptions = struct {
   // Authentication
   token: ?[]const u8 = null,
-  token_handler: ?*const fn () []const u8 = null,
+  token_handler: ?*const fn (allocator: Allocator) ![]const u8 = null,
 };

Then call with allocator and free after use (see suggestion on Lines 1237-1242).

1237-1242: Prefer an allocator-aware, fallible token handler and ensure freeing

If you adopt the fallible handler, capture ownership and free it once CONNECT is built.

Apply:

-        // Get authentication token (dynamic handler takes precedence)
-        const auth_token = if (self.options.token_handler) |handler|
-            handler()
-        else
-            self.options.token;
+        // Get authentication token (dynamic handler takes precedence)
+        var auth_token: ?[]const u8 = null;
+        var owned_token: ?[]const u8 = null;
+        if (self.options.token_handler) |handler| {
+            const tok = try handler(self.allocator);
+            auth_token = tok;
+            owned_token = tok; // owned; will free before returning
+        } else {
+            auth_token = self.options.token; // borrowed from user; do not free
+        }
+        defer if (owned_token) |t| self.allocator.free(t);

1343-1389: Parsing -ERR: good coverage; minor improvements

No allocation would be even better, but OK for rare -ERRs.
Consider matching specific “permissions violation for publish/subscription”, “authorization timeout”, and “parser error” (see earlier diff).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 501ec19 and b1dbdec.

📒 Files selected for processing (3)

src/connection.zig (9 hunks)
src/root.zig (1 hunks)
tests/auth_test.zig (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/auth_test.zig

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-30T18:48:28.333Z

Learnt from: lalinsky
PR: lalinsky/nats.zig#66
File: src/connection.zig:317-323
Timestamp: 2025-08-30T18:48:28.333Z
Learning: In src/connection.zig, the establishConnection() call in connectToServer() is during initial connection setup where no race conditions exist - the connection starts disconnected and other threads would see the connecting status and wait appropriately.

Applied to files:

src/connection.zig

🔇 Additional comments (7)

src/root.zig (1)

21-21: Public re-export of ProtocolError — looks good

This surfaces protocol errors cleanly at the root. No functional risk.

src/connection.zig (6)

119-120: Plumbing ProtocolError into ConnectionError is correct

This lets callers handle specific protocol failures (e.g., AuthorizationViolation) distinctly. Good call.

407-408: Good: errdefer close() on connect failures

This ensures cleanup if establishConnection/handshake fails later. The unlock defer runs before close(), so no lock inversion.

425-425: Propagating establishConnection failure with try

Right choice; avoids swallowing socket setup failures.

437-437: Waiting for handshake here is fine

Condition wait releases the mutex, allowing reader to drive state transitions.

1402-1406: Solid: parse once, log mapped error

Keeps logs meaningful and avoids re-parsing downstream.

1409-1415: Correct: propagate ProtocolError during handshake

This gives precise failure reasons to callers and integrates cleanly with ConnectionError.

coderabbitai · 2025-09-13T16:48:52Z

src/connection.zig

            .lang = build_options.lang,
            .version = build_options.version,
            .protocol = 1,
+            .auth_token = auth_token,


💡 Verification agent

🧩 Analysis chain

Ensure we don’t emit "auth_token": null

If auth_token is null and stringify emits nulls, some servers reject the CONNECT. Set the option explicitly.

- try std.json.stringify(connect_obj, .{}, buffer.writer()); + try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());

🌐 Web query:

In current Zig std (0.12.x), what is the default for std.json.stringify option `emit_null_optional_fields`? Is it false by default?

💡 Result:

No — in Zig std 0.12.x the default for StringifyOptions.emit_null_optional_fields is true. [1][2]

Sources:
[1] std/json/stringify.zig (browseable) — shows emit_null_optional_fields: bool = true.
[2] std/lib source copy (Fossies) — documents Options.emit_null_optional_fields and the behavior.

Don't emit "auth_token": null — explicitly disable emitting null optional fields.

Zig std 0.12.x defaults emit_null_optional_fields = true, so stringify will emit nulls; set it to false.

- try std.json.stringify(connect_obj, .{}, buffer.writer()); + try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.auth_token = auth_token,

.auth_token = auth_token,

};

try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());

🤖 Prompt for AI Agents

In src/connection.zig around line 1253, the JSON output is emitting "auth_token": null because Zig 0.12.x defaults emit_null_optional_fields = true; update the stringify call or the StringifyOptions used there to set emit_null_optional_fields = false so optional fields that are null are omitted from output (i.e., explicitly disable emitting null optional fields in the options passed to std.json.stringify/format).

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

coderabbitai bot approved these changes Sep 13, 2025

View reviewed changes

lalinsky merged commit b1a27f6 into main Sep 13, 2025
1 of 2 checks passed

lalinsky deleted the token-authentication branch September 13, 2025 16:47

coderabbitai bot requested changes Sep 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement token authentication #118

Implement token authentication #118

Uh oh!

lalinsky commented Sep 13, 2025

Uh oh!

coderabbitai bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement token authentication #118

Implement token authentication #118

Uh oh!

Conversation

lalinsky commented Sep 13, 2025

Uh oh!

coderabbitai bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Sep 13, 2025 •

edited

Loading