pyrefly: cluster C cleanup (control-flow annotations) by rjpower · Pull Request #4808 · marin-community/marin

rjpower · 2026-04-15T23:51:35Z

Summary

Cluster C (control-flow / data / annotation) of the pyrefly tightening
plan. Stacked on #4804 (PR-0). Clears 9 cluster C categories to zero
and re-enables 4 of them (the 5 others, plus invalid-annotation and
friends, were re-enabled in PR-0).

Baseline delta

1551 -> 1203 lines
129 -> 100 suppressed diagnostics

Categories re-enabled in `[tool.pyrefly.errors]`

`unbound-name`
`not-iterable`
`bad-index`
`bad-context-manager`

Category dispositions

Category	Before	After
unbound-name	41	0
bad-context-manager	14	0
bad-return	13	0
not-callable	7	0
unsupported-operation	4	0
bad-assignment	4	0
not-iterable	3	0
bad-index	3	0
invalid-inheritance	1	0

Fixes

`bad-context-manager`: collapsed `remove_tpu_lockfile_on_exit`
into a plain `@contextmanager` (no callers used the decorator form).
`bad-return` / `bad-index` / `unsupported-operation` real bugs
and annotation tightening (see commit 9da309b for the full list).
`unbound-name` real bug at
`datakit/download/huggingface.py:314`: the error message referenced
`hf_source_path` which is only bound in the else-branch.
~48 per-site `# pyrefly: ignore[...]` with rationale for
correlated-conditional false positives that the current pyrefly
can't track (rollout RNG gates, pallas return_argmax gates,
optim/kron flax/hax partitioning gates, haliax return_index
gates, etc.).

Annotations fixed vs suppressed

Annotations tightened (no suppression): 11 sites
Per-site `# pyrefly: ignore` added: ~48 sites

See `logs/pyrefly-tightening/pr1.md` for the full breakdown.

Test plan

`./infra/pre-commit.py --all-files` passes

🤖 Generated with Claude Code

The dual-purpose form (decorator | context manager) returned an untyped `FunctionType | _GeneratorContextManager`, which prevented pyrefly from recognizing the context-manager protocol at every call site (14 bad-context-manager diagnostics). No caller uses it as a decorator, so collapse it to a single `@contextmanager`-decorated generator with `Iterator[None]` return type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Real bugs: - datakit/download/huggingface.py: use `source_url` (always bound) instead of `hf_source_path` (else-branch only) in the escape-path error message. - rl/scripts/replay_completions.py: add missing return at the end of `replay_all_with_warmup`; the function was declared to return a dict but fell through after the final while-loop. - rl/weight_transfer/__init__.py: assert `JAXTransferServer` and `JAXTransferClient` are non-None before calling them (they can be None when the JAX transfer extension is unavailable). Annotation / type tightening: - rl/weight_transfer/arrow_flight.py: return `list[pa.RecordBatch]` (the concrete type) instead of `Sequence[...]`; propagate through `_weights_store` and `store_weights`. - cluster/config.py: cast `config["tpu_generation"]` to `str` so it indexes `GENERATION_CONFIGS` correctly. - web/convert.py: widen `convert_page_with_resiliparse` and `convert_page` return type to `dict[str, str | None]` (title can be None); handle None content in ar5iv transform. - transform/huggingface/dataset_to_eval.py: fix `format_prompt_response`'s `answer_idx` parameter type from `str` to `int` (all callers pass an int). - levanter/store/jagged_array.py: switch internal offset containers from `list[tuple]` to `list[list]` so the `offsets[zero_pos] = [0, ...]` rewrites actually type-check. - fray/v1/job/context.py: annotate `raw_to_wrapped` as `dict[Future, Future | GeneratorFuture]` to reflect both assignment paths. - haliax/_src/parsing.py: `parse_einsum` now returns `tuple[Sequence[Expression] | None, Expression]` (the fallback path sets `lhses = None`). Narrow at the sole caller via `raise_parse_error`. - levanter/checkpoint.py: `_tracemalloc_memory_state` returns `dict[str, str | None]` to reflect `_format_bytes_human_readable`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pyrefly's control-flow analysis can't correlate two branches that share the same boolean predicate ("if flag: ... (later) if flag: <use X>"), nor narrow generics through runtime isinstance fences. These are not bugs, but they're spread across enough kernel, optim, and ops code that blanket-disabling the categories loses real signal. Per-site `# pyrefly: ignore[<code>]` with rationale at each call, grouped by file: unbound-name: - marin/rl/rollout_worker.py: 8 sites, gated on `use_jax_rng`. - marin/evaluation/save_logprobs.py: 4 sites, gated on `config.top_k is not None`. - marin/transform/huggingface/dataset_to_eval.py: 2 sites in a try/ except where `k` is always bound on the first iteration. - iris/cluster/bundle.py: `blob` bound via `break` path; attempt==2 else-branch raises. - levanter kernels pallas gpu/reference (7 sites), gated on `return_argmax`. - levanter optim kron (6 sites), gated on `not return_partition_specs_only`, `flax_partitioned`, `hax_partitioned`, `have_params_sharding`. - levanter optim soap (7 sites), gated on `merge_small_dims`, `partition_grads_into_blocks`, `mesh is not None`. - levanter store jagged_array (2 sites) gated on `self.shapes is not None`. - haliax ops (6 sites), gated on `return_index` / `return_inverse` / `return_counts`. - haliax einsum (1 site), letter bound in if branch; else raises. bad-return: - haliax state_dict, core, nn/scan: generic T vs concrete runtime type sites. - levanter compat/hf_checkpoints: dataclasses.replace loses type through `# type: ignore`. - zephyr dataset: sum's Literal[0] widens. not-iterable: - haliax nn/conv: replace_axis returns AxisSelection (iterable for our call). - levanter grad_accum: R is constrained to a tuple shape at all call sites. - levanter models/mixtral: scan return over-widened. bad-index: - levanter eval_harness: segment_ids narrowed by None-branch above. bad-assignment: - levanter callbacks/watch: list[str] narrowed to list[Target literals] at runtime. - levanter models/flash_attention: eliminate_axes returns broad AxisSelection (x2). - levanter models/loss: resolve_axis returns AxisSpec for a single name. not-callable: - haliax tree_util (x2): old_is_leaf non-None in else-branch. - levanter eval_harness: tokenizer is declared a property but returns a callable HF tokenizer; handle_stop_sequences / postprocess_generated_text are None only when lm_eval is unavailable. unsupported-operation: - iris cluster controller: COUNT(*) fetchone is non-None. invalid-inheritance: - levanter eval_harness: TemplateLM base lacks type stubs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Re-enable unbound-name, not-iterable, bad-index, bad-context-manager in `[tool.pyrefly.errors]` (removed the four `<cat> = false` lines). All remaining sites in these categories are fixed or carry a per-site ignore from the preceding commits. Baseline: 1551 -> 1203 lines (129 -> 100 suppressed diagnostics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-15T23:51:49Z

Claude finished @rjpower's task in 4m 54s —— View job

Code review

Triage PR (open, not draft, not previously reviewed)
Identify relevant CLAUDE.md/AGENTS.md
Summarize PR changes
Run parallel review (2 compliance + 2 bug agents)
Validate flagged issues
Post review

Findings

1 high-signal issue (inline): lib/marin/src/marin/rl/scripts/replay_completions.py:219-223 — the new return dict in replay_all_with_warmup is missing the failed_batches, total_time, and results keys that the early-return branch at lines 190-197 includes and that main() reads at line 279. Adding the return surfaced the schema inconsistency without fully fixing it; running the script will now KeyError on result["failed_batches"]. See the inline comment for a suggested patch.

Validated clean

lib/marin/src/marin/utils.py — remove_tpu_lockfile_on_exit collapse: confirmed all callsites use with ...(), no decorator usage anywhere.
lib/marin/src/marin/datakit/download/huggingface.py:314 — source_url is in scope in both branches above the usage; correct fix.
lib/levanter/src/levanter/store/jagged_array.py — tuple→list change is consistent at both sites; the next-line index assignment requires a list.
lib/marin/src/marin/transform/ar5iv/transform_ar5iv.py — or "" shifts None→"" downstream but is a benign tightening.
Return-type widenings (convert.py, checkpoint.py, parsing.py) reflect existing runtime behavior; no caller regressions in the diff.
pyproject.toml re-enablement of 4 categories (unbound-name, not-iterable, bad-index, bad-context-manager) is consistent with .pyrefly-baseline.json (those categories absent from the baseline).
Per-site # pyrefly: ignore[...] comments include rationale per AGENTS.md guidance.
No CLAUDE.md / AGENTS.md violations found (imports at top, no compatibility shims, no swallowed exceptions, no *_utils.py).
· Branch: pyrefly/pr1-cluster-c

claude · 2026-04-15T23:56:34Z

+    return {
+        "total_requests": len(requests),
+        "total_batches": total_batches,
+        "completed_batches": batch_idx,
+    }


The new return dict is missing keys that main() reads, so replay_completions.py will now KeyError instead of returning None (which would have TypeError'd on subscript). Specifically main() reads result["failed_batches"] at line 279, and the early-return branch at lines 190-197 returns failed_batches, total_time, and results as well. Adding the return statement here surfaced (but did not finish fixing) the inconsistency.

Suggested change

return {

"total_requests": len(requests),

"total_batches": total_batches,

"completed_batches": batch_idx,

}

return {

"total_requests": len(requests),

"total_batches": total_batches,

"completed_batches": batch_idx,

"failed_batches": [],

"total_time": 0,

"results": [],

}

(The failed_batches/total_time/results values here are placeholders matching the early-return shape — replay_all_with_warmup doesn't currently track per-batch failures or wall-time, so a fuller fix would either (a) make send_batch_requests return a result and aggregate them here, or (b) drop those fields from the early-return and update main() to use .get("failed_batches", []). Either way works; the suggested patch is the minimum to stop main() from KeyError'ing.)

rjpower and others added 4 commits April 15, 2026 23:50

rjpower added the agent-generated Created by automation/agent label Apr 15, 2026

claude bot reviewed Apr 15, 2026

View reviewed changes

rjpower mentioned this pull request Apr 16, 2026

pyrefly: cluster D cleanup (bad-specialization + override) #4809

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyrefly: cluster C cleanup (control-flow annotations)#4808

pyrefly: cluster C cleanup (control-flow annotations)#4808
rjpower wants to merge 4 commits intopyrefly/pr0-real-bugsfrom
pyrefly/pr1-cluster-c

rjpower commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

claude bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rjpower commented Apr 15, 2026

Summary

Baseline delta

Categories re-enabled in `[tool.pyrefly.errors]`

Category dispositions

Fixes

Annotations fixed vs suppressed

Test plan

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Findings

Validated clean

Uh oh!

claude bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 15, 2026 •

edited

Loading