Skip to content

Investigate ROCm memory leak on CI integration tests #1241

@scotts

Description

@scotts

See failing run on CI: https://github.com/pytorch/kineto/actions/runs/21599182062/job/62240085046?pr=1240. Failure output:

________________________ TestProfilerCUDA.test_mem_leak ________________________
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/unittest/case.py", line 57, in testPartExecutor
    yield
  File "/opt/conda/lib/python3.11/unittest/case.py", line 623, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/lib/python3.11/unittest/case.py", line 579, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/pytorch/torch/testing/_internal/common_utils.py", line 3364, in wrapper
    method(*args, **kwargs)
  File "/pytorch/test/profiler/test_profiler.py", line 131, in test_mem_leak
    self.assertTrue(
  File "/opt/conda/lib/python3.11/unittest/case.py", line 715, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true : memory usage is increasing, deque([2728726528, 2729267200, 2729807872, 2730397696, 2730938368], maxlen=5)

To execute this test, run the following from the base repo dir:
    python test/profiler/test_profiler.py TestProfilerCUDA.test_mem_leak

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions