@@ -8,11 +8,18 @@ thorough implementation. The refactoring from slice to map for `leafRemoteCfgs`,
88the lifecycle management via ` quitCh ` /` removed ` /` connInProgress ` , and the
99` RemoteLeafOpts.name() ` identity scheme are all solid design choices.
1010
11+ ** Update (latest force-push):** The PR author has addressed two of our earlier
12+ findings (the ` goto ` loop and minor nits) and added NKey support to identity
13+ matching. The remaining items below reflect the current state of the PR.
14+
1115## Strengths
1216
13- - ** Clean identity scheme** : Using ` name() ` (URLs + account + credentials) for
14- remote identity with URL redaction is a good approach that prevents credential
15- leakage in logs while providing reliable deduplication.
17+ - ** Clean identity scheme** : Using ` name() ` (URLs + account + credentials + NKey)
18+ for remote identity with URL/NKey redaction is a good approach that prevents
19+ credential leakage in logs while providing reliable deduplication. The latest
20+ update correctly includes NKey in the identity string, closing a gap where two
21+ remotes with different NKeys but same URLs/account would have been treated as
22+ identical.
1623
1724- ** Robust lifecycle management** : The ` quitCh ` , ` connInProgress ` , and ` removed `
1825 fields with their accessor methods provide clean concurrent state management.
@@ -21,11 +28,15 @@ the lifecycle management via `quitCh`/`removed`/`connInProgress`, and the
2128
2229- ** Reflection-based ` checkConfigsEqual() ` ** : Good generic approach for detecting
2330 unsupported config changes. The ignore list pattern makes it easy to extend as
24- new reloadable fields are added.
31+ new reloadable fields are added. Note: ` LocalAccount ` was removed from the skip
32+ list in the latest update, meaning an account change is now correctly treated as
33+ a remove+add (identity change) rather than an unsupported modification.
2534
26- - ** Retry logic in ` getLeafNodeOptionsChanges() ` ** : The ` DO_REMOTES ` retry loop
27- with backoff for connect-in-progress scenarios handles the inherent race between
28- config reload and active connection attempts gracefully.
35+ - ** Retry logic in ` getLeafNodeOptionsChanges() ` ** : The ` forLoop ` retry with
36+ 50ms backoff for connect-in-progress scenarios handles the inherent race between
37+ config reload and active connection attempts gracefully. The latest update
38+ refactored the ` goto DO_REMOTES ` pattern into a proper ` for ` loop with labeled
39+ ` continue ` , which is much more idiomatic.
2940
3041- ** ` addLeafNodeConnection() ` now returns bool** : The ` stillValid() ` check under
3142 the server lock before adding to the leafs map closes a race window where a
@@ -51,18 +62,69 @@ to run under both the server and client locks to prevent a gap where another
5162goroutine could observe an inconsistent ` connInProgress ` flag between removal and
5263reconnect (see comment at leafnode.go:2072-2078).
5364
54- ### 2. ` DO_REMOTES ` goto pattern (stylistic nit)
55-
56- The ` DO_REMOTES ` label with ` goto ` in ` getLeafNodeOptionsChanges() ` is functional
57- but makes the control flow harder to follow. A ` for ` loop with ` continue ` would be
58- more idiomatic Go, though this is a stylistic preference and not a correctness
59- issue.
60-
61- ## Minor Nits
62-
63- - ` reload.go:1101 ` — Log says ` lrc.RemoteLeafOpts.name() ` but ` lrc.name() ` would
64- suffice since ` leafNodeCfg ` embeds ` *RemoteLeafOpts ` .
65- - ` reload_test.go:7159 ` — Comment typo: "Remote remote" should be "Remove remote".
65+ ### 2. Reload blocking window vs connection timeout (design observation)
66+
67+ The retry window in ` getLeafNodeOptionsChanges() ` is ** 1 second** (20 attempts ×
68+ 50ms), but actual connection establishment can take ** 5–6+ seconds** :
69+
70+ | Phase | Default timeout |
71+ | ----------------------| -----------------|
72+ | TCP dial | 1s (` DEFAULT_ROUTE_DIAL ` ) |
73+ | TLS handshake | 2s (` DEFAULT_LEAF_TLS_TIMEOUT ` ) |
74+ | Reconnect delay | 1s (` DEFAULT_LEAF_NODE_RECONNECT ` ) |
75+ | Jitter | up to 1s |
76+
77+ If ` connInProgress ` remains true for longer than 1s (e.g. slow TLS handshake),
78+ the reload fails with "cannot be enabled at the moment, try again". This is
79+ ** by design** (fail-safe over fail-open), and the error message correctly tells
80+ the operator to retry. However, in environments with high-latency TLS
81+ connections, operators may experience consistent reload failures during
82+ reconnection windows. Consider whether ` maxAttempts ` should be configurable or
83+ the error message should mention the specific cause (e.g. "connection to remote
84+ still in progress").
85+
86+ ### ~~ 3. ` DO_REMOTES ` goto pattern~~ — resolved
87+
88+ The ` goto DO_REMOTES ` label was refactored into a `for failed := range
89+ maxAttempts` loop with labeled ` continue forLoop`. This is cleaner and also fixes
90+ a subtle improvement: ` nlo ` is now re-initialized on each retry iteration,
91+ preventing stale state from prior failed attempts. The ` s.mu.RLock() ` is also now
92+ taken/released per iteration rather than held across retries, reducing lock
93+ contention.
94+
95+ ## ~~ Minor Nits~~ — resolved
96+
97+ Both nits from the previous review have been addressed in the latest push:
98+ - ` lrc.RemoteLeafOpts.name() ` → ` lrc.name() ` ✓
99+ - "Remote remote" → "Remove remote" typo ✓
100+ - Duplicate remote error messages now use ` safeName() ` instead of ` name() ` to
101+ avoid leaking credentials ✓
102+
103+ ## New Changes in Latest Push
104+
105+ ### NKey support in remote identity
106+
107+ ` generateRemoteLeafOptsName() ` now includes the ` Nkey ` field (redacted in
108+ ` safeName() ` ). This means:
109+ - Two remotes with same URLs/account but different NKeys are correctly treated
110+ as distinct remotes
111+ - NKey changes trigger a remove+add cycle (correct behavior)
112+ - New test ` TestConfigReloadRemoteLeafNodeNkeyChange ` validates this end-to-end
113+ - Unit tests added for identity matching with different accounts, creds, and NKeys
114+
115+ ### ` LocalAccount ` change semantics
116+
117+ ` LocalAccount ` was removed from the ` checkConfigsEqual() ` skip list. Previously,
118+ changing a remote's ` LocalAccount ` would have been rejected as an unsupported
119+ modification. Now it's treated as an identity change (remove old + add new),
120+ which is the correct semantic — the account is part of what identifies a remote.
121+
122+ ### Code quality improvements
123+
124+ - ` fmt.Appendf ` used instead of ` []byte(fmt.Sprintf(...)) ` in tests — avoids
125+ unnecessary string→byte conversion
126+ - ` leafnode.go:220 ` — duplicate remote validation now uses ` safeName() ` for
127+ error messages (consistent with other error paths)
66128
67129## Testing Coverage Gaps Identified & Implemented
68130
0 commit comments