Skip to content

[iris] Investigate noisy memray 'Failed to compress input file' logs #4517

@Calvin-Xu

Description

@Calvin-Xu

Describe the bug
Some Iris task logs periodically emit Failed to compress input file, often alongside a memray deactivation message, even when the task continues normally. This is noisy and makes it look like a task-local failure even when the real failures are unrelated.

To Reproduce

  1. Inspect logs from a long-running qsplit replay child, e.g. under /calvinxu/dm-qsplit240-300m-6b-20260406-022411.
  2. Wait for a profiling/deactivation interval or scan the archived task logs.
  3. Observe repeated Failed to compress input file messages while many affected tasks keep running or later succeed.

Expected behavior
Task logs should not emit a recurring compression error unless profiling actually failed and the task should surface that as a real error.

Additional context
Current evidence suggests this is not the cause of task failures. The exact string does not appear in Marin/Iris source, including the DuckDB log-store path at lib/iris/src/iris/cluster/log_store/duckdb_store.py. Memray is only referenced in explicit profiling code paths such as lib/iris/src/iris/cluster/runtime/profile.py and lib/iris/src/iris/cluster/providers/k8s/tasks.py. That makes this look more like an upstream memray/runtime profiler issue or platform-side profiling hook than a bug in normal task scheduling or log compression. We should confirm where memray is being invoked in production task containers and suppress or downgrade this message if it is benign.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-generatedCreated by automation/agentbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions