Describe the bug
Some Iris task logs periodically emit Failed to compress input file, often alongside a memray deactivation message, even when the task continues normally. This is noisy and makes it look like a task-local failure even when the real failures are unrelated.
To Reproduce
- Inspect logs from a long-running qsplit replay child, e.g. under
/calvinxu/dm-qsplit240-300m-6b-20260406-022411.
- Wait for a profiling/deactivation interval or scan the archived task logs.
- Observe repeated
Failed to compress input file messages while many affected tasks keep running or later succeed.
Expected behavior
Task logs should not emit a recurring compression error unless profiling actually failed and the task should surface that as a real error.
Additional context
Current evidence suggests this is not the cause of task failures. The exact string does not appear in Marin/Iris source, including the DuckDB log-store path at lib/iris/src/iris/cluster/log_store/duckdb_store.py. Memray is only referenced in explicit profiling code paths such as lib/iris/src/iris/cluster/runtime/profile.py and lib/iris/src/iris/cluster/providers/k8s/tasks.py. That makes this look more like an upstream memray/runtime profiler issue or platform-side profiling hook than a bug in normal task scheduling or log compression. We should confirm where memray is being invoked in production task containers and suppress or downgrade this message if it is benign.
Describe the bug
Some Iris task logs periodically emit
Failed to compress input file, often alongside a memray deactivation message, even when the task continues normally. This is noisy and makes it look like a task-local failure even when the real failures are unrelated.To Reproduce
/calvinxu/dm-qsplit240-300m-6b-20260406-022411.Failed to compress input filemessages while many affected tasks keep running or later succeed.Expected behavior
Task logs should not emit a recurring compression error unless profiling actually failed and the task should surface that as a real error.
Additional context
Current evidence suggests this is not the cause of task failures. The exact string does not appear in Marin/Iris source, including the DuckDB log-store path at
lib/iris/src/iris/cluster/log_store/duckdb_store.py. Memray is only referenced in explicit profiling code paths such aslib/iris/src/iris/cluster/runtime/profile.pyandlib/iris/src/iris/cluster/providers/k8s/tasks.py. That makes this look more like an upstream memray/runtime profiler issue or platform-side profiling hook than a bug in normal task scheduling or log compression. We should confirm where memray is being invoked in production task containers and suppress or downgrade this message if it is benign.