Summary
In __afr_handle_ping_event(), the inner check (up_children - 1) < priv->halo_min_replicas is algebraically unreachable when the outer guard up_children > priv->halo_min_replicas holds. The "Overriding halo threshold" log message is never emitted.
Affected code
|
if (child_latency_msec > halo_max_latency_msec && |
|
priv->child_up[idx] == 1 && up_children > priv->halo_min_replicas) { |
|
if ((up_children - 1) < priv->halo_min_replicas) { |
|
gf_log(child_xlator->name, GF_LOG_INFO, |
|
"Overriding halo threshold, " |
|
"min replicas: %d", |
|
priv->halo_min_replicas); |
|
} else { |
|
gf_log(child_xlator->name, GF_LOG_INFO, |
|
"Child latency (%" PRId64 |
|
" ms) " |
|
"exceeds halo threshold (%" PRId64 |
|
"), " |
|
"marking child down.", |
|
child_latency_msec, halo_max_latency_msec); |
|
if (priv->halo_child_up[idx]) { |
|
*event = GF_EVENT_CHILD_DOWN; |
|
} |
|
} |
The dead branch is at lines 5887–5892. The outer guard (lines 5885–5886) guarantees up_children > halo_min_replicas, making the inner check at line 5887 always FALSE.
Algebraic proof
The outer guard requires up_children > halo_min_replicas. For non-negative integers, this implies up_children >= halo_min_replicas + 1, therefore up_children - 1 >= halo_min_replicas. The inner check (up_children - 1) < halo_min_replicas is the negation of this — always FALSE.
Signed/unsigned: up_children is int (computed by counting child_up[i] == 1, always non-negative). halo_min_replicas is uint32_t. The implicit int→unsigned promotion is safe because up_children is non-negative and both values are bounded by child_count (max ~16 for practical replica sets). No overflow or sign-conversion hazard.
Provenance
The dead branch has been in upstream since the halo feature landed in GlusterFS 3.11 (2017), but was actually unreachable from the very first prototype in 2014. It survived upstream for 8+ years because halo-enabled defaults to False and the dead branch only logs — it never affects behavior.
| Date |
Commit |
Author |
What happened |
| 2014-04-29 |
bbceb48 |
Richard Wareing (FB) |
First halo prototype. Outer guard > with inner check < min. Dead from birth. |
| 2014-09-08 |
7a6ead5 |
Richard Wareing (FB) |
Replaced the dead branch with find_worst_up_child(this) == idx — a reachable check. The author reworked the swap logic on his development branch. |
| 2017-03-21 |
ca10139 |
Kevin Vigor (FB) |
Squash-merged halo to upstream (Gerrit 16099 / 16177). Based on a version that predated Wareing's 7a6ead5 fix — the dead branch was re-introduced. |
| 2020-02-04 |
4bb65d6 |
Pranith Kumar K (RH) |
cluster/afr: Fixes for halo (bz#1800583). Added halo_child_up[] guards to the live branch but did not touch the dead branch. |
git merge-base --is-ancestor 7a6ead5c b6cc5261d5 returns FALSE — Wareing's fix was on a development branch (release-3.8-fb) never included in Vigor's upstream squash.
Intent vs. implementation
The author's intent was a safety net: before marking a high-latency child down, check whether doing so would drop the replica count below halo_min_replicas. The outer guard up_children > halo_min_replicas already prevents this — you can only enter the block when there are strictly more replicas than the minimum, so removing one cannot drop below minimum. If the outer guard had been >= instead of >, the inner check would have been reachable.
Proposed fix
Remove the unreachable branch, collapse the if/else to just the live branch. No behavior change.
Summary
In
__afr_handle_ping_event(), the inner check(up_children - 1) < priv->halo_min_replicasis algebraically unreachable when the outer guardup_children > priv->halo_min_replicasholds. The "Overriding halo threshold" log message is never emitted.Affected code
glusterfs/xlators/cluster/afr/src/afr-common.c
Lines 5885 to 5903 in ae1d696
The dead branch is at lines 5887–5892. The outer guard (lines 5885–5886) guarantees
up_children > halo_min_replicas, making the inner check at line 5887 always FALSE.Algebraic proof
The outer guard requires
up_children > halo_min_replicas. For non-negative integers, this impliesup_children >= halo_min_replicas + 1, thereforeup_children - 1 >= halo_min_replicas. The inner check(up_children - 1) < halo_min_replicasis the negation of this — always FALSE.Signed/unsigned:
up_childrenisint(computed by countingchild_up[i] == 1, always non-negative).halo_min_replicasisuint32_t. The implicitint→unsignedpromotion is safe becauseup_childrenis non-negative and both values are bounded bychild_count(max ~16 for practical replica sets). No overflow or sign-conversion hazard.Provenance
The dead branch has been in upstream since the halo feature landed in GlusterFS 3.11 (2017), but was actually unreachable from the very first prototype in 2014. It survived upstream for 8+ years because
halo-enableddefaults toFalseand the dead branch only logs — it never affects behavior.>with inner check< min. Dead from birth.find_worst_up_child(this) == idx— a reachable check. The author reworked the swap logic on his development branch.cluster/afr: Fixes for halo(bz#1800583). Addedhalo_child_up[]guards to the live branch but did not touch the dead branch.git merge-base --is-ancestor 7a6ead5c b6cc5261d5returns FALSE — Wareing's fix was on a development branch (release-3.8-fb) never included in Vigor's upstream squash.Intent vs. implementation
The author's intent was a safety net: before marking a high-latency child down, check whether doing so would drop the replica count below
halo_min_replicas. The outer guardup_children > halo_min_replicasalready prevents this — you can only enter the block when there are strictly more replicas than the minimum, so removing one cannot drop below minimum. If the outer guard had been>=instead of>, the inner check would have been reachable.Proposed fix
Remove the unreachable branch, collapse the if/else to just the live branch. No behavior change.