Fix dirty address list OOM with halted SMP CPUs by garybeihl · Pull Request #189 · renode/renode-infrastructure

garybeihl · 2026-03-19T17:49:01Z

Summary

Fix unbounded memory growth when one CPU is halted (e.g., SMP boot with nosmp)
Replace shared list + index tracking with per-consumer HashSets
Skip halted CPUs during dirty address broadcast, mark them for full TLB flush on resume

When one CPU is halted, its dirty address index never advances, preventing TryReduceBroadcastedDirtyAddresses from trimming the shared list. With a running CPU continuously dirtying pages, the list grows unbounded (134M+ entries observed), eventually causing OOM.

The new design adds pages directly to each same-architecture consumer's HashSet, skipping halted CPUs. On resume, the skipped CPU gets a full TLB flush via TlibInvalidateTranslationCache instead of replaying millions of stale entries.

Discovered while booting Linux on an AST2600 (dual Cortex-A7) with nosmp maxcpus=1.

Test plan

Boot Linux SMP kernel with nosmp maxcpus=1 — no OOM after extended runtime
Boot Linux SMP kernel normally — both CPUs share dirty addresses correctly
Halt/resume a CPU — resumed CPU gets full TLB flush

When one CPU is halted (e.g., SMP boot with nosmp), the halted CPU never fetches dirty addresses. The shared per-architecture list grew unbounded (134M+ entries observed) because TryReduceBroadcastedDirtyAddresses could not advance past the halted CPU's unread index. Replace the shared list + index tracking with per-consumer HashSets. AppendDirtyAddresses now adds pages directly to each same-architecture consumer's set, skipping halted CPUs and marking them for a full TLB flush on resume. GetNewDirtyAddressesForCore returns null when a full flush is needed, which TranslationCPU handles by calling TlibInvalidateTranslationCache. This eliminates the unbounded memory growth and the O(n) RemoveRange operations that were also a performance bottleneck. Signed-off-by: Gary Beihl <garybeihl@microsoft.com>

CLAassistant · 2026-03-19T17:49:12Z

All committers have signed the CLA.

garybeihl mentioned this pull request Mar 19, 2026

ast2600: Add Aspeed AST2600 EVB platform support renode/renode#888

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dirty address list OOM with halted SMP CPUs#189

Fix dirty address list OOM with halted SMP CPUs#189
garybeihl wants to merge 1 commit intorenode:masterfrom
garybeihl:fix-dirty-address-oom

garybeihl commented Mar 19, 2026

Uh oh!

CLAassistant commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garybeihl commented Mar 19, 2026

Summary

Test plan

Uh oh!

CLAassistant commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Mar 19, 2026 •

edited

Loading