[iris] K8s LogCollector silently skips log collection for nested zephyr pipeline pods

**Describe the bug**

The K8s `LogCollector` does not collect logs for cache-copy (and likely cache-probe) task pods. The `ResourceCollector` tracks them correctly (`kubectl top` calls appear in controller logs), but no `kubectl logs` calls are made for the same pods. The `get-task-logs` RPC returns zero entries.

**To Reproduce**

1. Run a tokenize job with cache-copy on the K8s provider (e.g. `nemotron_data.py` on CoreWeave).
2. Wait for cache-copy coordinator and worker pods to reach Running state.
3. Query logs: `iris rpc controller get-task-logs --id <cache-copy-coord-job-id>`
4. Observe: empty response with `cursor: 0`, no log entries.
5. Verify the pod IS producing logs: `kubectl logs <pod> -c task --tail=5` shows output.
6. Verify `ResourceCollector` IS tracking the pod: controller process logs show `kubectl top <pod>` calls.
7. Verify `LogCollector` is NOT tracking the pod: no `kubectl logs <pod>` calls in controller process logs.

**Expected behavior**

`LogCollector` should track all task pods that `_track_pod` is called for, including those from nested zephyr pipelines (cache-probe, cache-copy).

**Additional context**

The `_track_pod` method (`tasks.py:784-790`) calls both `log_collector.track()` and `resource_collector.track()`. The resource collector works, but the log collector silently drops these pods. The `log_store` is wired correctly at `controller.py:970`.

These pods have correct labels (`iris.managed=true`, `iris.runtime=iris-kubernetes`) and appear in the `_poll_pods` managed pod list. 125 other pods ARE being log-fetched, but zero cache-copy/cache-probe pods are.

Suspected cause: a race condition or ordering issue where nested child jobs' tasks are polled before the LogCollector is ready, or the LogCollector's `_pods` dict silently rejects duplicate keys from re-tracking on subsequent sync cycles.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[iris] K8s LogCollector silently skips log collection for nested zephyr pipeline pods #4414

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[iris] K8s LogCollector silently skips log collection for nested zephyr pipeline pods #4414

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions