Skip to content

Conversation

@mzient
Copy link
Contributor

@mzient mzient commented Jan 14, 2026

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Reported in #6153
The auxiliary thread-local pool could grow indefinitely when there was imbalance between threads (one thread caused allocation of auxiliary blocks and the other returned them to the pool and they were never reused).

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

We don't have facilities for memory consumption testing. The repro from #6153 no longer increases memory consumption.

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Comment on lines +59 to +60
if (ptr)
res.deallocate(ptr, size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this succeed once with plain host mem - in this case, even if the test fails, we should free the memory.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41737713]: BUILD STARTED

@greptile-apps
Copy link

greptile-apps bot commented Jan 14, 2026

Greptile Summary

Fixed unbounded memory growth in thread-local memory pool allocators by adding a threshold-based size limit. When the free list exceeds the threshold (default 1024 items), it trims down to half the threshold during deallocation.

Key Changes:

  • Added threshold template parameter and count_ member variable to track free list size
  • Modified deallocate() to trim the free list when it exceeds the threshold
  • Updated allocate() and purge() to maintain the count correctly
  • Ensured thread-local instances always have a threshold limit (defaulting to 1024 if not specified)
  • Improved OOM test to properly clean up if allocation unexpectedly succeeds

Impact:
This fixes the memory leak reported in issue #6153 where imbalanced allocation patterns between threads caused the auxiliary thread-local pool to grow indefinitely. The trimming strategy (reducing to threshold/2) provides hysteresis to avoid thrashing.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk
  • The fix is well-designed and addresses a real memory leak issue. The implementation correctly maintains the count and uses hysteresis to avoid performance issues. The test improvement is defensive and good practice. Score is 4 rather than 5 due to lack of automated tests for the memory consumption fix itself, though the repro from Increasing rss in pipeline execution #6153 was manually verified.
  • No files require special attention

Important Files Changed

Filename Overview
include/dali/core/mm/detail/aux_alloc.h Added threshold-based pool size limiting to prevent unbounded memory growth in thread-local allocators
dali/core/mm/basic_resource_test.cc Improved OOM test to prevent potential memory leak if exception is not thrown

Sequence Diagram

sequenceDiagram
    participant Thread1 as Thread 1
    participant Thread2 as Thread 2
    participant Allocator as fixed_size_allocator<br/>(thread_local)
    participant FreeList as Free List

    Note over Thread1,FreeList: Before fix: unbounded growth

    Thread1->>Allocator: allocate()
    Allocator->>FreeList: pop from free_list
    Note over FreeList: count-- (new in PR)
    Allocator-->>Thread1: return block

    Thread2->>Allocator: deallocate(block)
    Allocator->>FreeList: push to free_list
    Note over FreeList: count++ (new in PR)
    
    Note over Allocator: If count > threshold
    
    Allocator->>Allocator: Trim to threshold/2
    loop while count > threshold/2
        Allocator->>FreeList: pop block
        Allocator->>Allocator: free(block)
        Note over FreeList: count--
    end
    
    Note over Thread1,FreeList: Prevents unbounded memory growth<br/>when threads have imbalanced alloc/dealloc
Loading

@mzient mzient added the important-fix Fixes an important issue in the software or development environment. label Jan 14, 2026
free_list_ = bptr;
count_++;
if (threshold && count_ > threshold) {
while (count_ > threshold / 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why threshold / 2?

@JanuszL JanuszL self-assigned this Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

important-fix Fixes an important issue in the software or development environment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants