Skip to content

Conversation

@lalinsky
Copy link
Owner

Add support for token-based authentication with both static tokens and dynamic token handlers. Authentication tokens are included in the CONNECT protocol message and authentication failures are properly detected and reported.

Features:

  • ConnectionOptions.token for static token authentication
  • ConnectionOptions.token_handler for dynamic token generation
  • Enhanced error detection for authentication failures in -ERR responses
  • Test infrastructure with token-authenticated NATS server

Add support for token-based authentication with both static tokens and
dynamic token handlers. Authentication tokens are included in the CONNECT
protocol message and authentication failures are properly detected and
reported.

Features:
- ConnectionOptions.token for static token authentication
- ConnectionOptions.token_handler for dynamic token generation
- Enhanced error detection for authentication failures in -ERR responses
- Test infrastructure with token-authenticated NATS server
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 13, 2025

Walkthrough

Adds token-based authentication support to the client (ConnectionOptions.token / token_handler), includes resolved auth_token in the CONNECT handshake, and maps server -ERR messages to a new ProtocolError type. Adds tests for token auth, updates test utilities, and introduces a token-auth NATS service in docker-compose for tests.

Changes

Cohort / File(s) Summary of changes
Connection auth & protocol errors
src/connection.zig
New public ProtocolError and added to ConnectionError; ConnectionOptions gains token and token_handler; handshake resolves token (handler overrides static) and embeds auth_token in CONNECT; refactors error flow (uses errdefer self.close(), replaces some catch with try) and adds parseProtocolError mapping -ERR texts to ProtocolError variants; server -ERR handling logs and surfaces mapped protocol errors.
Public re-export
src/root.zig
Re-exports ProtocolError (pub const ProtocolError = @import("connection.zig").ProtocolError;).
Tests: auth suite
tests/auth_test.zig, tests/all_tests.zig
Adds tests/auth_test.zig with tests for: static token success, token_handler success, handler precedence over static token, invalid token failure (timeout), and no-auth failure; wires auth_test.zig into tests/all_tests.zig.
Test utilities
tests/utils.zig
Adds Node.token_auth = 14225 and shifts Node.unknown to 14226 so test harness targets the new auth service port.
Test infra: docker-compose
docker-compose.test.yml
Adds nats-token-auth service using nats:2.10-alpine with --auth test_token_123, host networking, configured ports (server 14225, http 18225), debug and server_name, and a healthcheck against http://127.0.0.1:18225/healthz.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Implement token authentication" is concise, single-sentence, and directly describes the primary change (adding token-based auth and token_handler support), so it accurately reflects the changeset and is appropriate for history scanning.
Description Check ✅ Passed The description explicitly lists the new features (ConnectionOptions.token, token_handler, CONNECT payload inclusion, enhanced -ERR handling) and mentions the added tests, which matches the file-level summaries, so it is on-topic and passes the lenient description check.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch token-authentication

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/utils.zig (1)

84-86: Wait for all 4 services to be healthy to avoid auth test flakiness.

We now run 4 services (including nats-token-auth). Waiting for only 3 can race the auth tests. Bump the threshold.

Apply:

-        if (healthy_count >= 3) {
+        if (healthy_count >= 4) {
             return;
         }
src/connection.zig (1)

1314-1339: Broaden auth error detection to be case-insensitive and include common variants.

Current checks miss messages like “Authentication Timeout” (capital A). Make matching robust.

Apply:

-            if (std.mem.containsAtLeast(u8, err_msg, 1, "Authorization Violation") or
-                std.mem.containsAtLeast(u8, err_msg, 1, "authorization violation") or
-                std.mem.containsAtLeast(u8, err_msg, 1, "authentication"))
+            if (std.mem.containsAtLeast(u8, err_msg, 1, "Authorization Violation") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "authorization violation") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "Authorization Timeout") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "authentication") or
+                std.mem.containsAtLeast(u8, err_msg, 1, "Authentication"))
             {
                 self.handshake_error = ConnectionError.AuthFailed;
             } else {
                 self.handshake_error = ConnectionError.ConnectionFailed;
             }
🧹 Nitpick comments (4)
docker-compose.test.yml (1)

81-100: Service definition for token-auth is good; consider pinning image to a patch tag.

nats:2.10-alpine can drift under you. For reproducible CI, pin to a known-good patch (e.g., 2.10.x-alpine) that you already validate locally.

src/connection.zig (2)

197-200: Public API: token and token_handler

API shape is reasonable. Two small improvements to consider:

  • Accept an error-able token handler: ?*const fn () anyerror![]const u8 to allow retrieval failures.
  • If you expect handlers needing state, add an optional token_handler_ctx: ?*anyopaque and a signature fn (*anyopaque) []const u8.

1208-1213: Avoid emitting "auth_token": null — set emit_null_optional_fields = false

Confirmed: std.json.StringifyOptions exposes emit_null_optional_fields (bool).

File: src/connection.zig lines 1208-1213 (also apply at 1224-1235)

Apply:

-        try std.json.stringify(connect_obj, .{}, buffer.writer());
+        try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());
tests/auth_test.zig (1)

64-83: Tighten expected error once auth classification is broadened.

After making auth detection case-insensitive, this should deterministically be AuthFailed.

Apply:

-        // Accept either AuthFailed or ConnectionFailed for now
-        try std.testing.expect(err == nats.ConnectionError.AuthFailed or err == nats.ConnectionError.ConnectionFailed);
+        try std.testing.expect(err == nats.ConnectionError.AuthFailed);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d097f9e and 501ec19.

📒 Files selected for processing (5)
  • docker-compose.test.yml (1 hunks)
  • src/connection.zig (7 hunks)
  • tests/all_tests.zig (1 hunks)
  • tests/auth_test.zig (1 hunks)
  • tests/utils.zig (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-25T14:51:33.570Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#31
File: tests/jetstream_pull_test.zig:21-23
Timestamp: 2025-08-25T14:51:33.570Z
Learning: The nats.zig project uses a "tests:beforeEach" test in tests/all_tests.zig that runs before each test to clean up all JetStream streams and consumers, ensuring test isolation. This eliminates the need for unique test resource names as each test starts with a clean JetStream state.

Applied to files:

  • tests/all_tests.zig
  • tests/auth_test.zig
📚 Learning: 2025-08-30T18:48:28.333Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#66
File: src/connection.zig:317-323
Timestamp: 2025-08-30T18:48:28.333Z
Learning: In src/connection.zig, the establishConnection() call in connectToServer() is during initial connection setup where no race conditions exist - the connection starts disconnected and other threads would see the connecting status and wait appropriately.

Applied to files:

  • src/connection.zig
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (7)
tests/utils.zig (1)

10-11: Enum addition looks correct and consistent with docker-compose.

token_auth = 14225 matches the new service port, and bumping unknown avoids collision. No issues.

src/connection.zig (1)

378-409: Good: automatic cleanup on initial connect failures.

errdefer self.close() plus waiting for handshake under the mutex is solid and matches the prior “no races during initial connect” guidance.

tests/all_tests.zig (2)

8-8: Adding auth tests to the suite: good.

In combination with bumping the healthy-count to 4, this should be stable.


40-41: Minor: align health wait with 4 services.

Once utils waits for 4 “healthy”, this stays as-is; otherwise, consider increasing the timeout slightly if CI is slow.

tests/auth_test.zig (3)

7-20: Static token success path looks correct.

Covers the handshake and basic I/O; good smoke test.


42-62: Precedence test reads well.

Validates handler > static token, which matches the client behavior.


86-99: No-auth against auth server

This is the right negative test and will remain stable with the CONNECT changes above.

Implement detailed error categorization for all NATS server -ERR messages
matching the nats.go client approach. This allows client code to handle
specific protocol errors instead of generic connection failures.

Features:
- ProtocolError enum with 15+ specific error types for authentication,
  limits, protocol violations, and connection issues
- parseProtocolError() function to map -ERR messages to specific errors
- Enhanced processErr() to propagate specific errors to client code
- Updated tests to expect AuthorizationViolation instead of generic AuthFailed
- Export ProtocolError for client access

Client code can now distinguish between different failure types:
- AuthorizationViolation vs AuthExpired vs AuthRevoked
- MaxConnectionsExceeded vs SecureConnectionRequired
- PermissionViolation vs MaxPayloadViolation
- UnknownServerError fallback for unrecognized errors
@lalinsky lalinsky merged commit b1a27f6 into main Sep 13, 2025
1 of 2 checks passed
@lalinsky lalinsky deleted the token-authentication branch September 13, 2025 16:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/root.zig (1)

21-21: Add a brief doc comment and changelog entry for new public API

Document ProtocolError at the root to help users discover it and note the addition in the release notes.

src/connection.zig (4)

121-148: Cover a few common NATS -ERR variants as distinct ProtocolError cases

Consider adding specific variants for messages that appear frequently in the wild to improve diagnostics and branching:

  • authorization timeout
  • permissions violation for publish/subscription (more precise than a generic permissions violation)
  • parser error

Apply:

 pub const ProtocolError = error{
   // Authentication/Authorization errors
   AuthorizationViolation, // "Authorization Violation"
+  AuthorizationTimeout,   // "Authorization Timeout"
   AuthExpired,            // "User Authentication Expired"
   AuthRevoked,            // "User Authentication Revoked"
   AccountAuthExpired,     // "Account Authentication Expired"
-  PermissionViolation,    // "Permissions Violation"
+  PermissionViolation,    // generic fallback
+  PermissionViolationPublish,   // "Permissions Violation for Publish"
+  PermissionViolationSubscription, // "Permissions Violation for Subscription"
   // Connection/Limit errors
   MaxConnectionsExceeded, // "maximum connections exceeded"
   ConnectionThrottling,   // "Connection throttling is active"
   MaxPayloadViolation,    // "Maximum Payload Violation"
   MaxSubscriptionsExceeded, // "maximum subscriptions exceeded"
   // Protocol errors
   SecureConnectionRequired, // "Secure Connection - TLS Required"
   InvalidClientProtocol,  // "invalid client protocol"
+  ParserError,            // "Parser Error"
   UnknownProtocolOperation, // "Unknown Protocol Operation"
   InvalidPublishSubject,  // "Invalid Publish Subject"
   NoRespondersRequiresHeaders, // "no responders requires headers support"
   // Account errors
   FailedAccountRegistration, // "Failed Account Registration"
   // Generic fallback
   UnknownServerError,     // For unrecognized -ERR messages
 };

And extend the matcher:

- } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation")) {
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation for publish")) {
+     return ProtocolError.PermissionViolationPublish;
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation for subscription")) {
+     return ProtocolError.PermissionViolationSubscription;
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "permissions violation")) {
      return ProtocolError.PermissionViolation;
   }
+ // ...
+ else if (std.mem.containsAtLeast(u8, lower_err, 1, "authorization timeout")) {
+     return ProtocolError.AuthorizationTimeout;
+ } else if (std.mem.containsAtLeast(u8, lower_err, 1, "parser error")) {
+     return ProtocolError.ParserError;
+ }

226-229: Token handler API lacks error and allocation semantics

The zero-arg, infallible token_handler returning a borrowed slice constrains real-world use (e.g., fetching/refreshing tokens, handling failures) and is ambiguous about ownership. Recommend making it fallible and allocator-aware.

Apply:

 pub const ConnectionOptions = struct {
   // Authentication
   token: ?[]const u8 = null,
-  token_handler: ?*const fn () []const u8 = null,
+  token_handler: ?*const fn (allocator: Allocator) ![]const u8 = null,
 };

Then call with allocator and free after use (see suggestion on Lines 1237-1242).


1237-1242: Prefer an allocator-aware, fallible token handler and ensure freeing

If you adopt the fallible handler, capture ownership and free it once CONNECT is built.

Apply:

-        // Get authentication token (dynamic handler takes precedence)
-        const auth_token = if (self.options.token_handler) |handler|
-            handler()
-        else
-            self.options.token;
+        // Get authentication token (dynamic handler takes precedence)
+        var auth_token: ?[]const u8 = null;
+        var owned_token: ?[]const u8 = null;
+        if (self.options.token_handler) |handler| {
+            const tok = try handler(self.allocator);
+            auth_token = tok;
+            owned_token = tok; // owned; will free before returning
+        } else {
+            auth_token = self.options.token; // borrowed from user; do not free
+        }
+        defer if (owned_token) |t| self.allocator.free(t);

1343-1389: Parsing -ERR: good coverage; minor improvements

  • No allocation would be even better, but OK for rare -ERRs.
  • Consider matching specific “permissions violation for publish/subscription”, “authorization timeout”, and “parser error” (see earlier diff).
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 501ec19 and b1dbdec.

📒 Files selected for processing (3)
  • src/connection.zig (9 hunks)
  • src/root.zig (1 hunks)
  • tests/auth_test.zig (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/auth_test.zig
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T18:48:28.333Z
Learnt from: lalinsky
PR: lalinsky/nats.zig#66
File: src/connection.zig:317-323
Timestamp: 2025-08-30T18:48:28.333Z
Learning: In src/connection.zig, the establishConnection() call in connectToServer() is during initial connection setup where no race conditions exist - the connection starts disconnected and other threads would see the connecting status and wait appropriately.

Applied to files:

  • src/connection.zig
🔇 Additional comments (7)
src/root.zig (1)

21-21: Public re-export of ProtocolError — looks good

This surfaces protocol errors cleanly at the root. No functional risk.

src/connection.zig (6)

119-120: Plumbing ProtocolError into ConnectionError is correct

This lets callers handle specific protocol failures (e.g., AuthorizationViolation) distinctly. Good call.


407-408: Good: errdefer close() on connect failures

This ensures cleanup if establishConnection/handshake fails later. The unlock defer runs before close(), so no lock inversion.


425-425: Propagating establishConnection failure with try

Right choice; avoids swallowing socket setup failures.


437-437: Waiting for handshake here is fine

Condition wait releases the mutex, allowing reader to drive state transitions.


1402-1406: Solid: parse once, log mapped error

Keeps logs meaningful and avoids re-parsing downstream.


1409-1415: Correct: propagate ProtocolError during handshake

This gives precise failure reasons to callers and integrates cleanly with ConnectionError.

.lang = build_options.lang,
.version = build_options.version,
.protocol = 1,
.auth_token = auth_token,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Ensure we don’t emit "auth_token": null

If auth_token is null and stringify emits nulls, some servers reject the CONNECT. Set the option explicitly.

-        try std.json.stringify(connect_obj, .{}, buffer.writer());
+        try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());

🌐 Web query:

In current Zig std (0.12.x), what is the default for std.json.stringify option `emit_null_optional_fields`? Is it false by default?

💡 Result:

No — in Zig std 0.12.x the default for StringifyOptions.emit_null_optional_fields is true. [1][2]

Sources:
[1] std/json/stringify.zig (browseable) — shows emit_null_optional_fields: bool = true.
[2] std/lib source copy (Fossies) — documents Options.emit_null_optional_fields and the behavior.


Don't emit "auth_token": null — explicitly disable emitting null optional fields.

Zig std 0.12.x defaults emit_null_optional_fields = true, so stringify will emit nulls; set it to false.

-        try std.json.stringify(connect_obj, .{}, buffer.writer());
+        try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.auth_token = auth_token,
.auth_token = auth_token,
};
try std.json.stringify(connect_obj, .{ .emit_null_optional_fields = false }, buffer.writer());
🤖 Prompt for AI Agents
In src/connection.zig around line 1253, the JSON output is emitting
"auth_token": null because Zig 0.12.x defaults emit_null_optional_fields = true;
update the stringify call or the StringifyOptions used there to set
emit_null_optional_fields = false so optional fields that are null are omitted
from output (i.e., explicitly disable emitting null optional fields in the
options passed to std.json.stringify/format).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant