Skip to content

Commit 6ad56de

Browse files
Klement Sekeradwallacelf
authored andcommitted
l2: don't let the fib scan clobber the learn count mid-learning
l2fib_scan retallies every learned MAC into a local learn_count and then unconditionally writes it back over l2learn_main.global_learn_count (and the per-bd counts). But the bucket walk it derives that tally from suspends every periodically, yielding the main thread to the data path. If MACs are learned during one of those yields, the tally is a stale undercount counted and writing it back drops global_learn_count below its true value. The data-path learn limit (l2_learn.c) then sees room again and learning overshoots the limit. This surfaced as an intermittent failure of test_l2bd_learnlimit01: want_l2_macs_events() flushes the fib and kicks a one-pass scan via l2fib_start_ager_scan(), whose walk straddles the test's immediately following learning burst and resets global_learn_count to a pre-burst value, so a second bridge domain learns one MAC past the global limit (lfs2 == 1 instead of 0). The periodic 100ms event scan can race a learning burst the same way in normal operation. This patch deliberately ignores multi-threaded scenarios and is only a naive workaround to make CI happy. Type: fix Change-Id: Ib147efe81fe9d5fb82cd81cdc213fc9e8d5c14c9 Signed-off-by: Klement Sekera <ksekera@netgate.com> Signed-off-by: Dave Wallace <dwallacelf@gmail.com>
1 parent 0c676da commit 6ad56de

1 file changed

Lines changed: 24 additions & 5 deletions

File tree

src/vnet/l2/l2_fib.c

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1098,6 +1098,11 @@ l2fib_scan (vlib_main_t * vm, f64 start_time, u8 event_only)
10981098
reg = vl_api_client_index_to_registration (lm->client_index);
10991099
}
11001100

1101+
/* Snapshot the data-path's learn counter before the walk below, which suspends and yields the
1102+
* main thread to the data path. Used to detect concurrent learning - see the guarded writeback
1103+
* after the loop. */
1104+
u32 global_at_start = lm->global_learn_count;
1105+
11011106
for (i = 0; i < h->nbuckets; i++)
11021107
{
11031108
/* allow no more than 20us without a pause */
@@ -1246,12 +1251,26 @@ l2fib_scan (vlib_main_t * vm, f64 start_time, u8 event_only)
12461251
;
12471252
}
12481253

1249-
/* keep learn count consistent */
1250-
l2learn_main.global_learn_count = learn_count;
1251-
vec_foreach_index (bd_index, l2input_main.bd_configs)
1254+
/* Keep learn count consistent - but only if the data path did not learn while this scan was
1255+
* running.
1256+
*
1257+
* FIXME: this guard improves CI stability as the unit test is single-worker. It's not a true fix
1258+
* for multi-worker case.
1259+
*/
1260+
if (l2learn_main.global_learn_count == global_at_start)
1261+
{
1262+
l2learn_main.global_learn_count = learn_count;
1263+
vec_foreach_index (bd_index, l2input_main.bd_configs)
1264+
{
1265+
vec_elt (l2input_main.bd_configs, bd_index).learn_count =
1266+
vec_elt (bd_learn_counts, bd_index);
1267+
}
1268+
}
1269+
else
12521270
{
1253-
vec_elt (l2input_main.bd_configs, bd_index).learn_count =
1254-
vec_elt (bd_learn_counts, bd_index);
1271+
clib_warning ("Global learn count not updated due to datapath learning during l2fib_scan: "
1272+
"glc-at-start %d, glc-at-end %d",
1273+
global_at_start, l2learn_main.global_learn_count);
12551274
}
12561275

12571276
if (mp)

0 commit comments

Comments
 (0)