Skip to content

afr: dead code in halo threshold override branch (__afr_handle_ping_event) #4683

Description

@ThalesBarretto

Summary

In __afr_handle_ping_event(), the inner check (up_children - 1) < priv->halo_min_replicas is algebraically unreachable when the outer guard up_children > priv->halo_min_replicas holds. The "Overriding halo threshold" log message is never emitted.

Affected code

if (child_latency_msec > halo_max_latency_msec &&
priv->child_up[idx] == 1 && up_children > priv->halo_min_replicas) {
if ((up_children - 1) < priv->halo_min_replicas) {
gf_log(child_xlator->name, GF_LOG_INFO,
"Overriding halo threshold, "
"min replicas: %d",
priv->halo_min_replicas);
} else {
gf_log(child_xlator->name, GF_LOG_INFO,
"Child latency (%" PRId64
" ms) "
"exceeds halo threshold (%" PRId64
"), "
"marking child down.",
child_latency_msec, halo_max_latency_msec);
if (priv->halo_child_up[idx]) {
*event = GF_EVENT_CHILD_DOWN;
}
}

The dead branch is at lines 5887–5892. The outer guard (lines 5885–5886) guarantees up_children > halo_min_replicas, making the inner check at line 5887 always FALSE.

Algebraic proof

The outer guard requires up_children > halo_min_replicas. For non-negative integers, this implies up_children >= halo_min_replicas + 1, therefore up_children - 1 >= halo_min_replicas. The inner check (up_children - 1) < halo_min_replicas is the negation of this — always FALSE.

Signed/unsigned: up_children is int (computed by counting child_up[i] == 1, always non-negative). halo_min_replicas is uint32_t. The implicit int→unsigned promotion is safe because up_children is non-negative and both values are bounded by child_count (max ~16 for practical replica sets). No overflow or sign-conversion hazard.

Provenance

The dead branch has been in upstream since the halo feature landed in GlusterFS 3.11 (2017), but was actually unreachable from the very first prototype in 2014. It survived upstream for 8+ years because halo-enabled defaults to False and the dead branch only logs — it never affects behavior.

Date Commit Author What happened
2014-04-29 bbceb48 Richard Wareing (FB) First halo prototype. Outer guard > with inner check < min. Dead from birth.
2014-09-08 7a6ead5 Richard Wareing (FB) Replaced the dead branch with find_worst_up_child(this) == idx — a reachable check. The author reworked the swap logic on his development branch.
2017-03-21 ca10139 Kevin Vigor (FB) Squash-merged halo to upstream (Gerrit 16099 / 16177). Based on a version that predated Wareing's 7a6ead5 fix — the dead branch was re-introduced.
2020-02-04 4bb65d6 Pranith Kumar K (RH) cluster/afr: Fixes for halo (bz#1800583). Added halo_child_up[] guards to the live branch but did not touch the dead branch.

git merge-base --is-ancestor 7a6ead5c b6cc5261d5 returns FALSE — Wareing's fix was on a development branch (release-3.8-fb) never included in Vigor's upstream squash.

Intent vs. implementation

The author's intent was a safety net: before marking a high-latency child down, check whether doing so would drop the replica count below halo_min_replicas. The outer guard up_children > halo_min_replicas already prevents this — you can only enter the block when there are strictly more replicas than the minimum, so removing one cannot drop below minimum. If the outer guard had been >= instead of >, the inner check would have been reachable.

Proposed fix

Remove the unreachable branch, collapse the if/else to just the live branch. No behavior change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions