Skip to content

[nightshift] 20260423 multi-cleanup#5122

Merged
rjpower merged 3 commits intomainfrom
nightshift/cleanup-20260423
Apr 23, 2026
Merged

[nightshift] 20260423 multi-cleanup#5122
rjpower merged 3 commits intomainfrom
nightshift/cleanup-20260423

Conversation

@claude-nightshift
Copy link
Copy Markdown
Contributor

Seed 6ee10bc6
Dead code falls away —
silent forks meet at the trunk,
green light on the pull.

Nightly multi-agent cleanup. Three parallel scouts each produced a focused change; a fourth (zephyr) was unable to apply its findings in-sandbox and is excluded.

Combined scout summary

lib/marin/src/marin — delete dead marin.core.runtime

Removed lib/marin/src/marin/core/runtime.py. The module contained cached_or_construct_output and a TaskConfig dataclass; neither was imported anywhere in the repo (verified via marin.core.runtime, cached_or_construct_output, and TaskConfig greps). A prior cleanup (0e0a458) already removed its only other export (RayConfig), and the remaining symbols were never wired up. Deletion also drops otherwise-unused imports of rigging.filesystem.open_url and marin.utils.fsspec_exists at module load.

lib/iris/src/iris — drop unreachable fallback paths in WorkerDispatcher

Removed two unreachable branches in iris.client.worker_pool.WorkerDispatcher:

  • The duplicate ActorClient construction in _run() could never fire because _discover_endpoint() already constructs the client before flipping status to IDLE (the only path into IDLE).
  • The time.sleep() fallback in _discover_endpoint() was unreachable because _run() always assigns self._stop_event before calling it.

Refactored _discover_endpoint() to accept stop_event as a parameter, dropping the redundant self._stop_event attribute and the dead code.

lib/levanter/src/levanter — fix value_at_step returning earliest value for all steps

Bug fix in levanter.schedule.value_at_step: the forward iteration returned the first schedule step whose start <= step, which for any step >= 0 always stopped at the first segment (typically start=0). That meant value_at_step would silently ignore all later schedule entries, returning (e.g.) the initial batch size for every step. Iterating in reverse so the latest matching segment wins restores the intended semantics (matching BatchSchedule.batch_size_at_step). Added regression tests covering scalar values, multi-segment schedules at boundaries and mid-segment, and the pre-first-segment error case.

lib/zephyr/src/zephyr — no change (sandbox-blocked)

Scout identified small cleanups (tautological chunk_size > 0 guards in execution.py/shuffle.py, a function-local import of unique_temp_path, a redundant existence check in writers.atomic_rename) but could not apply them — every edit against the worktree was rejected as a sensitive file with no approver in the autonomous run. Not included here; flagged for manual follow-up.

Validation

  • ./infra/pre-commit.py --all-files --fix — OK (ruff, black, pyrefly, license headers, ...)
  • pytest lib/levanter/tests/test_scheduler.py — 17/17 pass (includes new regression tests)
  • pytest lib/iris/tests/client/test_worker_pool.py — 4/4 pass

🤖 Generated with Claude Code

Nightshift Scout and others added 3 commits April 23, 2026 11:13
cached_or_construct_output and TaskConfig in marin/core/runtime.py
have no callers anywhere in the repo (0e0a458 previously removed
the only other export, RayConfig). Drop the file rather than carry
a dependency on rigging.filesystem and fsspec_exists for nothing.
The WorkerDispatcher had two unreachable branches:

* _run() contained a second ActorClient construction guarded by
  'if self._actor_client is None'. This could never fire: the only
  path to WorkerStatus.IDLE is via _discover_endpoint(), which already
  builds the ActorClient before setting the status.

* _discover_endpoint() had a time.sleep() fallback used when
  self._stop_event was unset, but _run() always assigns it before
  calling _discover_endpoint(), so the fallback was dead.

Pass stop_event explicitly into _discover_endpoint() and drop the
redundant self._stop_event attribute and ActorClient branch.
… all steps

The forward loop returned the first schedule step whose start <= step,
which for any non-negative schedule always stopped at the first segment
(typically start=0). Iterate in reverse so the last matching segment
wins, and add regression tests covering scalar values, multi-segment
schedules, and the pre-first-segment error case.
@claude-nightshift claude-nightshift Bot added agent-generated Created by automation/agent nightshift Automated nightshift fixes labels Apr 23, 2026
@claude-nightshift claude-nightshift Bot enabled auto-merge (squash) April 23, 2026 11:16
@claude-nightshift claude-nightshift Bot requested a review from rjpower April 23, 2026 11:16
@rjpower
Copy link
Copy Markdown
Collaborator

rjpower commented Apr 23, 2026

FYI @dlwh the schedule looked like a real bug.

@rjpower rjpower disabled auto-merge April 23, 2026 15:43
@rjpower rjpower merged commit 154ec05 into main Apr 23, 2026
39 of 41 checks passed
@rjpower rjpower deleted the nightshift/cleanup-20260423 branch April 23, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent nightshift Automated nightshift fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant