Commit 4a6f466
committed
Survive transient hosted node-agent registration flaps
Three layers of defence against the residual flakiness left after the
reconcile-poison fix:
1. Read-side machine binding now tolerates stale registrations.
resolve_machine_binding's stored-placement path used to return the
binding only when resolve_known_node_binding accepted the freshness
check. A new resolve_known_node_binding_for_read returns the stored
binding regardless of TTL, and resolve_machine_binding now uses it as
the optimistic fallback on a stale Err. If the underlying node-agent
is still reachable the proxy call succeeds; if it isn't, the actual
proxy failure is more informative than a synthetic stale rejection.
State-changing callers (placement reconcile, machine launch) keep
using the strict resolve_known_node_binding.
2. store_registered_node_refresh tolerates 60s of refreshed_at
regression. The previous strict check rejected any refresh whose
refreshed_at was less than the stored value, so a single NTP
step-correction or modest cross-host clock skew would silently wedge
a node for the rest of its TTL window every time. Now refreshes
within NODE_AGENT_REGISTRATION_REFRESHED_AT_REGRESSION_TOLERANCE_SECONDS
(60s) are accepted; only genuine attempts to overwrite the stored
high-water mark with much-older state are rejected. The
node_agent_surfaces_explicit_registration_failures test now uses a
600s future high-water mark to keep exercising the wildly-stale path.
3. NODE_AGENT_REGISTRATION_TTL_SECONDS raised from 45s to 120s.
Combined with the unchanged 5s refresh interval, a node now
tolerates ~24 missed refreshes (was ~9) before going stale. The
previous 9-cycle ceiling left no headroom for normal cross-AZ
network jitter; brief HTTP slowdowns repeatedly pushed
aws-linux-node-2 into the stale window during ordinary load.
Observed on prod 2026-05-06 as intermittent inspector reports of
'k3s-agent on cloud-aws-worker-2 as unreachable' while the K3s
cluster itself was healthy and serving pods.1 parent 490edb3 commit 4a6f466
1 file changed
Lines changed: 114 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
424 | 424 | | |
425 | 425 | | |
426 | 426 | | |
427 | | - | |
| 427 | + | |
428 | 428 | | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
429 | 434 | | |
430 | 435 | | |
431 | 436 | | |
| |||
4189 | 4194 | | |
4190 | 4195 | | |
4191 | 4196 | | |
4192 | | - | |
4193 | | - | |
4194 | | - | |
4195 | | - | |
4196 | | - | |
4197 | | - | |
4198 | | - | |
| 4197 | + | |
| 4198 | + | |
| 4199 | + | |
| 4200 | + | |
| 4201 | + | |
| 4202 | + | |
| 4203 | + | |
| 4204 | + | |
| 4205 | + | |
| 4206 | + | |
| 4207 | + | |
| 4208 | + | |
| 4209 | + | |
| 4210 | + | |
4199 | 4211 | | |
4200 | 4212 | | |
4201 | 4213 | | |
| |||
4829 | 4841 | | |
4830 | 4842 | | |
4831 | 4843 | | |
| 4844 | + | |
| 4845 | + | |
| 4846 | + | |
| 4847 | + | |
| 4848 | + | |
| 4849 | + | |
| 4850 | + | |
| 4851 | + | |
| 4852 | + | |
| 4853 | + | |
| 4854 | + | |
| 4855 | + | |
| 4856 | + | |
| 4857 | + | |
| 4858 | + | |
| 4859 | + | |
| 4860 | + | |
| 4861 | + | |
| 4862 | + | |
| 4863 | + | |
| 4864 | + | |
| 4865 | + | |
| 4866 | + | |
| 4867 | + | |
| 4868 | + | |
| 4869 | + | |
| 4870 | + | |
| 4871 | + | |
| 4872 | + | |
| 4873 | + | |
| 4874 | + | |
| 4875 | + | |
| 4876 | + | |
| 4877 | + | |
| 4878 | + | |
| 4879 | + | |
| 4880 | + | |
| 4881 | + | |
| 4882 | + | |
| 4883 | + | |
| 4884 | + | |
| 4885 | + | |
| 4886 | + | |
| 4887 | + | |
| 4888 | + | |
| 4889 | + | |
4832 | 4890 | | |
4833 | 4891 | | |
4834 | 4892 | | |
| |||
4847 | 4905 | | |
4848 | 4906 | | |
4849 | 4907 | | |
4850 | | - | |
4851 | | - | |
4852 | | - | |
4853 | | - | |
4854 | | - | |
4855 | | - | |
4856 | | - | |
4857 | | - | |
4858 | | - | |
4859 | | - | |
| 4908 | + | |
| 4909 | + | |
| 4910 | + | |
| 4911 | + | |
| 4912 | + | |
| 4913 | + | |
| 4914 | + | |
| 4915 | + | |
| 4916 | + | |
| 4917 | + | |
| 4918 | + | |
| 4919 | + | |
4860 | 4920 | | |
4861 | 4921 | | |
4862 | | - | |
| 4922 | + | |
4863 | 4923 | | |
4864 | 4924 | | |
| 4925 | + | |
| 4926 | + | |
| 4927 | + | |
| 4928 | + | |
| 4929 | + | |
| 4930 | + | |
| 4931 | + | |
| 4932 | + | |
4865 | 4933 | | |
4866 | | - | |
| 4934 | + | |
| 4935 | + | |
| 4936 | + | |
| 4937 | + | |
| 4938 | + | |
| 4939 | + | |
| 4940 | + | |
| 4941 | + | |
| 4942 | + | |
| 4943 | + | |
4867 | 4944 | | |
| 4945 | + | |
| 4946 | + | |
| 4947 | + | |
| 4948 | + | |
| 4949 | + | |
| 4950 | + | |
| 4951 | + | |
| 4952 | + | |
| 4953 | + | |
4868 | 4954 | | |
4869 | 4955 | | |
4870 | 4956 | | |
| |||
12110 | 12196 | | |
12111 | 12197 | | |
12112 | 12198 | | |
12113 | | - | |
12114 | | - | |
| 12199 | + | |
| 12200 | + | |
| 12201 | + | |
| 12202 | + | |
| 12203 | + | |
| 12204 | + | |
| 12205 | + | |
| 12206 | + | |
12115 | 12207 | | |
12116 | 12208 | | |
12117 | 12209 | | |
| |||
0 commit comments