Skip to content

Commit c1d0032

Browse files
committed
Update review for latest PR nats-io#7937 force-push changes
Reflects upstream changes: goto→for loop refactor, NKey identity matching, LocalAccount semantics fix, safeName() in error messages, and adds reload blocking window analysis. https://claude.ai/code/session_019n3SMwZZhfRYX4y6URkEbK
1 parent 8430caa commit c1d0032

File tree

1 file changed

+81
-19
lines changed

1 file changed

+81
-19
lines changed

REVIEW.md

Lines changed: 81 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,18 @@ thorough implementation. The refactoring from slice to map for `leafRemoteCfgs`,
88
the lifecycle management via `quitCh`/`removed`/`connInProgress`, and the
99
`RemoteLeafOpts.name()` identity scheme are all solid design choices.
1010

11+
**Update (latest force-push):** The PR author has addressed two of our earlier
12+
findings (the `goto` loop and minor nits) and added NKey support to identity
13+
matching. The remaining items below reflect the current state of the PR.
14+
1115
## Strengths
1216

13-
- **Clean identity scheme**: Using `name()` (URLs + account + credentials) for
14-
remote identity with URL redaction is a good approach that prevents credential
15-
leakage in logs while providing reliable deduplication.
17+
- **Clean identity scheme**: Using `name()` (URLs + account + credentials + NKey)
18+
for remote identity with URL/NKey redaction is a good approach that prevents
19+
credential leakage in logs while providing reliable deduplication. The latest
20+
update correctly includes NKey in the identity string, closing a gap where two
21+
remotes with different NKeys but same URLs/account would have been treated as
22+
identical.
1623

1724
- **Robust lifecycle management**: The `quitCh`, `connInProgress`, and `removed`
1825
fields with their accessor methods provide clean concurrent state management.
@@ -21,11 +28,15 @@ the lifecycle management via `quitCh`/`removed`/`connInProgress`, and the
2128

2229
- **Reflection-based `checkConfigsEqual()`**: Good generic approach for detecting
2330
unsupported config changes. The ignore list pattern makes it easy to extend as
24-
new reloadable fields are added.
31+
new reloadable fields are added. Note: `LocalAccount` was removed from the skip
32+
list in the latest update, meaning an account change is now correctly treated as
33+
a remove+add (identity change) rather than an unsupported modification.
2534

26-
- **Retry logic in `getLeafNodeOptionsChanges()`**: The `DO_REMOTES` retry loop
27-
with backoff for connect-in-progress scenarios handles the inherent race between
28-
config reload and active connection attempts gracefully.
35+
- **Retry logic in `getLeafNodeOptionsChanges()`**: The `forLoop` retry with
36+
50ms backoff for connect-in-progress scenarios handles the inherent race between
37+
config reload and active connection attempts gracefully. The latest update
38+
refactored the `goto DO_REMOTES` pattern into a proper `for` loop with labeled
39+
`continue`, which is much more idiomatic.
2940

3041
- **`addLeafNodeConnection()` now returns bool**: The `stillValid()` check under
3142
the server lock before adding to the leafs map closes a race window where a
@@ -51,18 +62,69 @@ to run under both the server and client locks to prevent a gap where another
5162
goroutine could observe an inconsistent `connInProgress` flag between removal and
5263
reconnect (see comment at leafnode.go:2072-2078).
5364

54-
### 2. `DO_REMOTES` goto pattern (stylistic nit)
55-
56-
The `DO_REMOTES` label with `goto` in `getLeafNodeOptionsChanges()` is functional
57-
but makes the control flow harder to follow. A `for` loop with `continue` would be
58-
more idiomatic Go, though this is a stylistic preference and not a correctness
59-
issue.
60-
61-
## Minor Nits
62-
63-
- `reload.go:1101` — Log says `lrc.RemoteLeafOpts.name()` but `lrc.name()` would
64-
suffice since `leafNodeCfg` embeds `*RemoteLeafOpts`.
65-
- `reload_test.go:7159` — Comment typo: "Remote remote" should be "Remove remote".
65+
### 2. Reload blocking window vs connection timeout (design observation)
66+
67+
The retry window in `getLeafNodeOptionsChanges()` is **1 second** (20 attempts ×
68+
50ms), but actual connection establishment can take **5–6+ seconds**:
69+
70+
| Phase | Default timeout |
71+
|----------------------|-----------------|
72+
| TCP dial | 1s (`DEFAULT_ROUTE_DIAL`) |
73+
| TLS handshake | 2s (`DEFAULT_LEAF_TLS_TIMEOUT`) |
74+
| Reconnect delay | 1s (`DEFAULT_LEAF_NODE_RECONNECT`) |
75+
| Jitter | up to 1s |
76+
77+
If `connInProgress` remains true for longer than 1s (e.g. slow TLS handshake),
78+
the reload fails with "cannot be enabled at the moment, try again". This is
79+
**by design** (fail-safe over fail-open), and the error message correctly tells
80+
the operator to retry. However, in environments with high-latency TLS
81+
connections, operators may experience consistent reload failures during
82+
reconnection windows. Consider whether `maxAttempts` should be configurable or
83+
the error message should mention the specific cause (e.g. "connection to remote
84+
still in progress").
85+
86+
### ~~3. `DO_REMOTES` goto pattern~~ — resolved
87+
88+
The `goto DO_REMOTES` label was refactored into a `for failed := range
89+
maxAttempts` loop with labeled `continue forLoop`. This is cleaner and also fixes
90+
a subtle improvement: `nlo` is now re-initialized on each retry iteration,
91+
preventing stale state from prior failed attempts. The `s.mu.RLock()` is also now
92+
taken/released per iteration rather than held across retries, reducing lock
93+
contention.
94+
95+
## ~~Minor Nits~~ — resolved
96+
97+
Both nits from the previous review have been addressed in the latest push:
98+
- `lrc.RemoteLeafOpts.name()``lrc.name()`
99+
- "Remote remote" → "Remove remote" typo ✓
100+
- Duplicate remote error messages now use `safeName()` instead of `name()` to
101+
avoid leaking credentials ✓
102+
103+
## New Changes in Latest Push
104+
105+
### NKey support in remote identity
106+
107+
`generateRemoteLeafOptsName()` now includes the `Nkey` field (redacted in
108+
`safeName()`). This means:
109+
- Two remotes with same URLs/account but different NKeys are correctly treated
110+
as distinct remotes
111+
- NKey changes trigger a remove+add cycle (correct behavior)
112+
- New test `TestConfigReloadRemoteLeafNodeNkeyChange` validates this end-to-end
113+
- Unit tests added for identity matching with different accounts, creds, and NKeys
114+
115+
### `LocalAccount` change semantics
116+
117+
`LocalAccount` was removed from the `checkConfigsEqual()` skip list. Previously,
118+
changing a remote's `LocalAccount` would have been rejected as an unsupported
119+
modification. Now it's treated as an identity change (remove old + add new),
120+
which is the correct semantic — the account is part of what identifies a remote.
121+
122+
### Code quality improvements
123+
124+
- `fmt.Appendf` used instead of `[]byte(fmt.Sprintf(...))` in tests — avoids
125+
unnecessary string→byte conversion
126+
- `leafnode.go:220` — duplicate remote validation now uses `safeName()` for
127+
error messages (consistent with other error paths)
66128

67129
## Testing Coverage Gaps Identified & Implemented
68130

0 commit comments

Comments
 (0)