Merged
Conversation
cached_or_construct_output and TaskConfig in marin/core/runtime.py have no callers anywhere in the repo (0e0a458 previously removed the only other export, RayConfig). Drop the file rather than carry a dependency on rigging.filesystem and fsspec_exists for nothing.
The WorkerDispatcher had two unreachable branches: * _run() contained a second ActorClient construction guarded by 'if self._actor_client is None'. This could never fire: the only path to WorkerStatus.IDLE is via _discover_endpoint(), which already builds the ActorClient before setting the status. * _discover_endpoint() had a time.sleep() fallback used when self._stop_event was unset, but _run() always assigns it before calling _discover_endpoint(), so the fallback was dead. Pass stop_event explicitly into _discover_endpoint() and drop the redundant self._stop_event attribute and ActorClient branch.
… all steps The forward loop returned the first schedule step whose start <= step, which for any non-negative schedule always stopped at the first segment (typically start=0). Iterate in reverse so the last matching segment wins, and add regression tests covering scalar values, multi-segment schedules, and the pre-first-segment error case.
Collaborator
|
FYI @dlwh the schedule looked like a real bug. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Nightly multi-agent cleanup. Three parallel scouts each produced a focused change; a fourth (zephyr) was unable to apply its findings in-sandbox and is excluded.
Combined scout summary
lib/marin/src/marin— delete deadmarin.core.runtimeRemoved
lib/marin/src/marin/core/runtime.py. The module containedcached_or_construct_outputand aTaskConfigdataclass; neither was imported anywhere in the repo (verified viamarin.core.runtime,cached_or_construct_output, andTaskConfiggreps). A prior cleanup (0e0a458) already removed its only other export (RayConfig), and the remaining symbols were never wired up. Deletion also drops otherwise-unused imports ofrigging.filesystem.open_urlandmarin.utils.fsspec_existsat module load.lib/iris/src/iris— drop unreachable fallback paths inWorkerDispatcherRemoved two unreachable branches in
iris.client.worker_pool.WorkerDispatcher:ActorClientconstruction in_run()could never fire because_discover_endpoint()already constructs the client before flipping status toIDLE(the only path intoIDLE).time.sleep()fallback in_discover_endpoint()was unreachable because_run()always assignsself._stop_eventbefore calling it.Refactored
_discover_endpoint()to acceptstop_eventas a parameter, dropping the redundantself._stop_eventattribute and the dead code.lib/levanter/src/levanter— fixvalue_at_stepreturning earliest value for all stepsBug fix in
levanter.schedule.value_at_step: the forward iteration returned the first schedule step whosestart <= step, which for anystep >= 0always stopped at the first segment (typicallystart=0). That meantvalue_at_stepwould silently ignore all later schedule entries, returning (e.g.) the initial batch size for every step. Iterating in reverse so the latest matching segment wins restores the intended semantics (matchingBatchSchedule.batch_size_at_step). Added regression tests covering scalar values, multi-segment schedules at boundaries and mid-segment, and the pre-first-segment error case.lib/zephyr/src/zephyr— no change (sandbox-blocked)Scout identified small cleanups (tautological
chunk_size > 0guards inexecution.py/shuffle.py, a function-local import ofunique_temp_path, a redundant existence check inwriters.atomic_rename) but could not apply them — every edit against the worktree was rejected as a sensitive file with no approver in the autonomous run. Not included here; flagged for manual follow-up.Validation
./infra/pre-commit.py --all-files --fix— OK (ruff, black, pyrefly, license headers, ...)pytest lib/levanter/tests/test_scheduler.py— 17/17 pass (includes new regression tests)pytest lib/iris/tests/client/test_worker_pool.py— 4/4 pass🤖 Generated with Claude Code