Skip to content

[nightshift] 20260421 multi-cleanup#4998

Merged
claude-nightshift[bot] merged 4 commits intomainfrom
nightshift/cleanup-20260421
Apr 21, 2026
Merged

[nightshift] 20260421 multi-cleanup#4998
claude-nightshift[bot] merged 4 commits intomainfrom
nightshift/cleanup-20260421

Conversation

@claude-nightshift
Copy link
Copy Markdown
Contributor

seed 900815dc

Unreached branches fall —
Dead timers hush in silence,
Repo breathes lighter.

Nightshift parallel cleanup run. Three scouts (levanter, marin, zephyr) produced independent dead-code removals; the iris scout found only already-committed changes and is not included here.

Combined summary

lib/levanter/src/levanter

Removed dead timing instrumentation that accumulated values nobody read. In callbacks.eval_loss_loop, total_load_time and total_loss_time were computed every iteration but never returned or logged, forcing a manual while/iter_/next(None sentinel) loop purely to measure batches vs. loss time — switched to a straightforward for loop and dropped the dead variables. In LoadingTimeTrackerIterator, the this_load_time attribute is only ever written and the init-time perf_counter wrap around iter() measured nothing meaningful; inlined the remaining total_time update and deleted both. Unit tests in tests/test_metrics.py (including test_eval_loss_loop), tests/test_eval.py, and tests/test_logging.py all pass.

lib/marin/src/marin

Removed lib/marin/src/marin/transform/fasttext/transform.py, a 98-line orphan module. Its top-level functions (convert_fasttext_to_dolma_format, TransformFasttextToDolmaConfig, generate_id) are not imported anywhere in the repo (marin source, experiments, or tests). The docstring advertised a python transform.py ... CLI but no __main__/argparse block existed. The conversion logic also contained a redundant self-overwriting assignment to text introduced in Oct 2024 and never corrected. Deleting the file (and its now-empty directory) removes ~100 lines of confirmed dead code.

lib/zephyr/src/zephyr

Removed the dead requires_full_shard boolean field from the Map physical-op dataclass in plan.py. The field was computed identically to needs_shard_context (both set to any(isinstance(op, MapShardOp) for op in pending_fusible)) but was never read anywhere in the codebase or tests — only needs_shard_context is consumed in run_stage at plan.py:792 to decide whether to pass ShardInfo to the fused function. Git blame showed the field was a vestige from commit ac67ba7 where needs_shard_context was introduced and requires_full_shard was not cleaned up. Removed 5 occurrences (docstring attribute, field declaration, local computation, and the constructor argument in FusionState.flush_pending) while preserving semantics.

Verification

  • ./infra/pre-commit.py --all-files --fix — clean
  • lib/levanter/tests/test_metrics.py, test_eval.py, test_logging.py — 39 passed
  • lib/zephyr/tests/test_optimization.py, test_dataset.py, test_groupby.py — 107 passed

Nightshift Scout and others added 3 commits April 21, 2026 11:14
…s_loop and LoadingTimeTrackerIterator

The callbacks.eval_loss_loop accumulated total_load_time and
total_loss_time but never returned or logged them, so the manual
while/next batching was only serving dead variables. Switch to a plain
for loop.

LoadingTimeTrackerIterator.this_load_time is only ever written, never
read; the init-time timing around iter() was also measuring nothing
useful. Drop both.
transform/fasttext/transform.py was an orphan: no callers in marin,
experiments, or tests, and its "Usage: python transform.py ..." docstring
was a lie (no argparse/__main__). The function also contained a redundant
assignment to ``text`` (one line overwritten by the next) dating to
Oct 2024. Delete the whole file and its (now-empty) directory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@claude-nightshift claude-nightshift Bot added agent-generated Created by automation/agent nightshift Automated nightshift fixes labels Apr 21, 2026
@claude-nightshift claude-nightshift Bot enabled auto-merge (squash) April 21, 2026 11:22
@claude-nightshift claude-nightshift Bot requested a review from dlwh April 21, 2026 11:22
@rjpower
Copy link
Copy Markdown
Collaborator

rjpower commented Apr 21, 2026

@claude looks like the cleanup took something it should have, or forgot to update a test.

18.70s call tests/processing/classification/deduplication/test_fuzzy.py::test_fuzzy_dups_single_source_schema_and_pair
=========================== short test summary info ============================
FAILED tests/test_grug_variant_contracts.py::test_grug_base_run_emits_expected_metrics_with_json_tracker - AttributeError: 'LoadingTimeTrackerIterator' object has no attribute 'this_load_time'
====== 1 failed, 512 passed, 112 skipped, 7 warnings in 345.96s (0:05:45) ======

pls fix

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 21, 2026

Claude finished @rjpower's task in 3m 45s —— View job


Task list

  • Inspect failing test and find where this_load_time is read
  • Restore this_load_time on LoadingTimeTrackerIterator
  • Run failing test plus related metric/logging tests
  • Run ./infra/pre-commit.py --all-files --fix
  • Commit and push

What happened

The scout grepped lib/ and src/ for readers of this_load_time, found none, and dropped the attribute — but three experiment trainers still read it on every step:

  • experiments/grug/base/train.py:470
  • experiments/grug/moe/train.py:512
  • experiments/grug/modular_opt/train.py:470

Each one does:

levanter.tracker.log({"throughput/loading_time": iterator.this_load_time}, step=step)

And tests/test_grug_variant_contracts.py::test_grug_base_run_emits_expected_metrics_with_json_tracker asserts the throughput/loading_time key is present in the summary (line 253). So removing the attribute turned a silent "only-written field" claim into an AttributeError inside the grug training loop.

Fix

Restore both this_load_time and the per-step assignment in lib/levanter/src/levanter/utils/logging.py. Kept the cleanup of the init-time perf_counter wrapping iter() (that one really wasn't used anywhere). total_time stays consistent with this_load_time.

class LoadingTimeTrackerIterator(Iterator[T]):
    def __init__(self, items: Iterable[T]):
        self.total_time = 0.0
        self.this_load_time = 0.0
        self.items = iter(items)

    def __next__(self) -> T:
        start = time.perf_counter()
        item = next(self.items)
        self.this_load_time = time.perf_counter() - start
        self.total_time += self.this_load_time
        return item

Test & lint status

  • tests/test_grug_variant_contracts.py7 passed (incl. the previously failing test_grug_base_run_emits_expected_metrics_with_json_tracker)
  • lib/levanter/tests/{test_metrics,test_eval,test_logging}.py39 passed (same set the PR originally verified)
  • ./infra/pre-commit.py --all-files --fix → clean (ruff, black, pyrefly, license headers, AST, etc. all OK)

Fix pushed as 5f2b50d55 on branch nightshift/cleanup-20260421.
• branch: nightshift/cleanup-20260421

The nightshift cleanup removed this_load_time after a src-only grep, but
three experiment trainers (experiments/grug/{base,moe,modular_opt}/train.py)
log it on every step as `throughput/loading_time`. The grug-base contract
test was asserting this key is emitted, so dropping the attribute turned
into an AttributeError at train time rather than a silent regression.

Restore both the attribute and the per-step assignment; keep the total_time
expressed in terms of this_load_time so the two stay consistent.

Co-authored-by: Russell Power <rjpower@users.noreply.github.com>
@claude-nightshift claude-nightshift Bot merged commit e59a495 into main Apr 21, 2026
37 checks passed
@claude-nightshift claude-nightshift Bot deleted the nightshift/cleanup-20260421 branch April 21, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent nightshift Automated nightshift fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant