Skip to content

pyrefly: cluster C cleanup (control-flow annotations)#4808

Open
rjpower wants to merge 4 commits intopyrefly/pr0-real-bugsfrom
pyrefly/pr1-cluster-c
Open

pyrefly: cluster C cleanup (control-flow annotations)#4808
rjpower wants to merge 4 commits intopyrefly/pr0-real-bugsfrom
pyrefly/pr1-cluster-c

Conversation

@rjpower
Copy link
Copy Markdown
Collaborator

@rjpower rjpower commented Apr 15, 2026

Summary

Cluster C (control-flow / data / annotation) of the pyrefly tightening
plan. Stacked on #4804 (PR-0). Clears 9 cluster C categories to zero
and re-enables 4 of them (the 5 others, plus invalid-annotation and
friends, were re-enabled in PR-0).

Baseline delta

  • 1551 -> 1203 lines
  • 129 -> 100 suppressed diagnostics

Categories re-enabled in `[tool.pyrefly.errors]`

  • `unbound-name`
  • `not-iterable`
  • `bad-index`
  • `bad-context-manager`

Category dispositions

Category Before After
unbound-name 41 0
bad-context-manager 14 0
bad-return 13 0
not-callable 7 0
unsupported-operation 4 0
bad-assignment 4 0
not-iterable 3 0
bad-index 3 0
invalid-inheritance 1 0

Fixes

  • `bad-context-manager`: collapsed `remove_tpu_lockfile_on_exit`
    into a plain `@contextmanager` (no callers used the decorator form).
  • `bad-return` / `bad-index` / `unsupported-operation` real bugs
    and annotation tightening (see commit 9da309b for the full list).
  • `unbound-name` real bug at
    `datakit/download/huggingface.py:314`: the error message referenced
    `hf_source_path` which is only bound in the else-branch.
  • ~48 per-site `# pyrefly: ignore[...]` with rationale for
    correlated-conditional false positives that the current pyrefly
    can't track (rollout RNG gates, pallas return_argmax gates,
    optim/kron flax/hax partitioning gates, haliax return_index
    gates, etc.).

Annotations fixed vs suppressed

  • Annotations tightened (no suppression): 11 sites
  • Per-site `# pyrefly: ignore` added: ~48 sites

See `logs/pyrefly-tightening/pr1.md` for the full breakdown.

Test plan

  • `./infra/pre-commit.py --all-files` passes

🤖 Generated with Claude Code

rjpower and others added 4 commits April 15, 2026 23:50
The dual-purpose form (decorator | context manager) returned an
untyped `FunctionType | _GeneratorContextManager`, which prevented
pyrefly from recognizing the context-manager protocol at every call
site (14 bad-context-manager diagnostics). No caller uses it as a
decorator, so collapse it to a single `@contextmanager`-decorated
generator with `Iterator[None]` return type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Real bugs:
- datakit/download/huggingface.py: use `source_url` (always bound)
  instead of `hf_source_path` (else-branch only) in the escape-path
  error message.
- rl/scripts/replay_completions.py: add missing return at the end of
  `replay_all_with_warmup`; the function was declared to return a
  dict but fell through after the final while-loop.
- rl/weight_transfer/__init__.py: assert `JAXTransferServer` and
  `JAXTransferClient` are non-None before calling them (they can be
  None when the JAX transfer extension is unavailable).

Annotation / type tightening:
- rl/weight_transfer/arrow_flight.py: return `list[pa.RecordBatch]`
  (the concrete type) instead of `Sequence[...]`; propagate through
  `_weights_store` and `store_weights`.
- cluster/config.py: cast `config["tpu_generation"]` to `str` so it
  indexes `GENERATION_CONFIGS` correctly.
- web/convert.py: widen `convert_page_with_resiliparse` and
  `convert_page` return type to `dict[str, str | None]` (title can
  be None); handle None content in ar5iv transform.
- transform/huggingface/dataset_to_eval.py: fix
  `format_prompt_response`'s `answer_idx` parameter type from `str`
  to `int` (all callers pass an int).
- levanter/store/jagged_array.py: switch internal offset containers
  from `list[tuple]` to `list[list]` so the `offsets[zero_pos] =
  [0, ...]` rewrites actually type-check.
- fray/v1/job/context.py: annotate `raw_to_wrapped` as
  `dict[Future, Future | GeneratorFuture]` to reflect both assignment
  paths.
- haliax/_src/parsing.py: `parse_einsum` now returns
  `tuple[Sequence[Expression] | None, Expression]` (the fallback path
  sets `lhses = None`). Narrow at the sole caller via `raise_parse_error`.
- levanter/checkpoint.py: `_tracemalloc_memory_state` returns
  `dict[str, str | None]` to reflect `_format_bytes_human_readable`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pyrefly's control-flow analysis can't correlate two branches that
share the same boolean predicate ("if flag: ... (later) if flag:
<use X>"), nor narrow generics through runtime isinstance fences.
These are not bugs, but they're spread across enough kernel, optim,
and ops code that blanket-disabling the categories loses real signal.

Per-site `# pyrefly: ignore[<code>]` with rationale at each call,
grouped by file:

unbound-name:
- marin/rl/rollout_worker.py: 8 sites, gated on `use_jax_rng`.
- marin/evaluation/save_logprobs.py: 4 sites, gated on
  `config.top_k is not None`.
- marin/transform/huggingface/dataset_to_eval.py: 2 sites in a try/
  except where `k` is always bound on the first iteration.
- iris/cluster/bundle.py: `blob` bound via `break` path; attempt==2
  else-branch raises.
- levanter kernels pallas gpu/reference (7 sites), gated on
  `return_argmax`.
- levanter optim kron (6 sites), gated on
  `not return_partition_specs_only`, `flax_partitioned`,
  `hax_partitioned`, `have_params_sharding`.
- levanter optim soap (7 sites), gated on `merge_small_dims`,
  `partition_grads_into_blocks`, `mesh is not None`.
- levanter store jagged_array (2 sites) gated on `self.shapes is not
  None`.
- haliax ops (6 sites), gated on `return_index` / `return_inverse`
  / `return_counts`.
- haliax einsum (1 site), letter bound in if branch; else raises.

bad-return:
- haliax state_dict, core, nn/scan: generic T vs concrete runtime
  type sites.
- levanter compat/hf_checkpoints: dataclasses.replace loses type
  through `# type: ignore`.
- zephyr dataset: sum's Literal[0] widens.

not-iterable:
- haliax nn/conv: replace_axis returns AxisSelection (iterable for
  our call).
- levanter grad_accum: R is constrained to a tuple shape at all
  call sites.
- levanter models/mixtral: scan return over-widened.

bad-index:
- levanter eval_harness: segment_ids narrowed by None-branch above.

bad-assignment:
- levanter callbacks/watch: list[str] narrowed to list[Target
  literals] at runtime.
- levanter models/flash_attention: eliminate_axes returns broad
  AxisSelection (x2).
- levanter models/loss: resolve_axis returns AxisSpec for a single
  name.

not-callable:
- haliax tree_util (x2): old_is_leaf non-None in else-branch.
- levanter eval_harness: tokenizer is declared a property but
  returns a callable HF tokenizer; handle_stop_sequences /
  postprocess_generated_text are None only when lm_eval is
  unavailable.

unsupported-operation:
- iris cluster controller: COUNT(*) fetchone is non-None.

invalid-inheritance:
- levanter eval_harness: TemplateLM base lacks type stubs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-enable unbound-name, not-iterable, bad-index, bad-context-manager
in `[tool.pyrefly.errors]` (removed the four `<cat> = false` lines).
All remaining sites in these categories are fixed or carry a
per-site ignore from the preceding commits.

Baseline: 1551 -> 1203 lines (129 -> 100 suppressed diagnostics).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rjpower rjpower added the agent-generated Created by automation/agent label Apr 15, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Claude finished @rjpower's task in 4m 54s —— View job


Code review

  • Triage PR (open, not draft, not previously reviewed)
  • Identify relevant CLAUDE.md/AGENTS.md
  • Summarize PR changes
  • Run parallel review (2 compliance + 2 bug agents)
  • Validate flagged issues
  • Post review

Findings

1 high-signal issue (inline): lib/marin/src/marin/rl/scripts/replay_completions.py:219-223 — the new return dict in replay_all_with_warmup is missing the failed_batches, total_time, and results keys that the early-return branch at lines 190-197 includes and that main() reads at line 279. Adding the return surfaced the schema inconsistency without fully fixing it; running the script will now KeyError on result["failed_batches"]. See the inline comment for a suggested patch.

Validated clean

  • lib/marin/src/marin/utils.pyremove_tpu_lockfile_on_exit collapse: confirmed all callsites use with ...(), no decorator usage anywhere.
  • lib/marin/src/marin/datakit/download/huggingface.py:314source_url is in scope in both branches above the usage; correct fix.
  • lib/levanter/src/levanter/store/jagged_array.py — tuple→list change is consistent at both sites; the next-line index assignment requires a list.
  • lib/marin/src/marin/transform/ar5iv/transform_ar5iv.pyor "" shifts None→"" downstream but is a benign tightening.
  • Return-type widenings (convert.py, checkpoint.py, parsing.py) reflect existing runtime behavior; no caller regressions in the diff.
  • pyproject.toml re-enablement of 4 categories (unbound-name, not-iterable, bad-index, bad-context-manager) is consistent with .pyrefly-baseline.json (those categories absent from the baseline).
  • Per-site # pyrefly: ignore[...] comments include rationale per AGENTS.md guidance.
  • No CLAUDE.md / AGENTS.md violations found (imports at top, no compatibility shims, no swallowed exceptions, no *_utils.py).
    · Branch: pyrefly/pr1-cluster-c

Comment on lines +219 to +223
return {
"total_requests": len(requests),
"total_batches": total_batches,
"completed_batches": batch_idx,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new return dict is missing keys that main() reads, so replay_completions.py will now KeyError instead of returning None (which would have TypeError'd on subscript). Specifically main() reads result["failed_batches"] at line 279, and the early-return branch at lines 190-197 returns failed_batches, total_time, and results as well. Adding the return statement here surfaced (but did not finish fixing) the inconsistency.

Suggested change
return {
"total_requests": len(requests),
"total_batches": total_batches,
"completed_batches": batch_idx,
}
return {
"total_requests": len(requests),
"total_batches": total_batches,
"completed_batches": batch_idx,
"failed_batches": [],
"total_time": 0,
"results": [],
}

(The failed_batches/total_time/results values here are placeholders matching the early-return shape — replay_all_with_warmup doesn't currently track per-batch failures or wall-time, so a fuller fix would either (a) make send_batch_requests return a result and aggregate them here, or (b) drop those fields from the early-return and update main() to use .get("failed_batches", []). Either way works; the suggested patch is the minimum to stop main() from KeyError'ing.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant