pyrefly: cluster C cleanup (control-flow annotations)#4808
pyrefly: cluster C cleanup (control-flow annotations)#4808rjpower wants to merge 4 commits intopyrefly/pr0-real-bugsfrom
Conversation
The dual-purpose form (decorator | context manager) returned an untyped `FunctionType | _GeneratorContextManager`, which prevented pyrefly from recognizing the context-manager protocol at every call site (14 bad-context-manager diagnostics). No caller uses it as a decorator, so collapse it to a single `@contextmanager`-decorated generator with `Iterator[None]` return type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Real bugs: - datakit/download/huggingface.py: use `source_url` (always bound) instead of `hf_source_path` (else-branch only) in the escape-path error message. - rl/scripts/replay_completions.py: add missing return at the end of `replay_all_with_warmup`; the function was declared to return a dict but fell through after the final while-loop. - rl/weight_transfer/__init__.py: assert `JAXTransferServer` and `JAXTransferClient` are non-None before calling them (they can be None when the JAX transfer extension is unavailable). Annotation / type tightening: - rl/weight_transfer/arrow_flight.py: return `list[pa.RecordBatch]` (the concrete type) instead of `Sequence[...]`; propagate through `_weights_store` and `store_weights`. - cluster/config.py: cast `config["tpu_generation"]` to `str` so it indexes `GENERATION_CONFIGS` correctly. - web/convert.py: widen `convert_page_with_resiliparse` and `convert_page` return type to `dict[str, str | None]` (title can be None); handle None content in ar5iv transform. - transform/huggingface/dataset_to_eval.py: fix `format_prompt_response`'s `answer_idx` parameter type from `str` to `int` (all callers pass an int). - levanter/store/jagged_array.py: switch internal offset containers from `list[tuple]` to `list[list]` so the `offsets[zero_pos] = [0, ...]` rewrites actually type-check. - fray/v1/job/context.py: annotate `raw_to_wrapped` as `dict[Future, Future | GeneratorFuture]` to reflect both assignment paths. - haliax/_src/parsing.py: `parse_einsum` now returns `tuple[Sequence[Expression] | None, Expression]` (the fallback path sets `lhses = None`). Narrow at the sole caller via `raise_parse_error`. - levanter/checkpoint.py: `_tracemalloc_memory_state` returns `dict[str, str | None]` to reflect `_format_bytes_human_readable`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pyrefly's control-flow analysis can't correlate two branches that
share the same boolean predicate ("if flag: ... (later) if flag:
<use X>"), nor narrow generics through runtime isinstance fences.
These are not bugs, but they're spread across enough kernel, optim,
and ops code that blanket-disabling the categories loses real signal.
Per-site `# pyrefly: ignore[<code>]` with rationale at each call,
grouped by file:
unbound-name:
- marin/rl/rollout_worker.py: 8 sites, gated on `use_jax_rng`.
- marin/evaluation/save_logprobs.py: 4 sites, gated on
`config.top_k is not None`.
- marin/transform/huggingface/dataset_to_eval.py: 2 sites in a try/
except where `k` is always bound on the first iteration.
- iris/cluster/bundle.py: `blob` bound via `break` path; attempt==2
else-branch raises.
- levanter kernels pallas gpu/reference (7 sites), gated on
`return_argmax`.
- levanter optim kron (6 sites), gated on
`not return_partition_specs_only`, `flax_partitioned`,
`hax_partitioned`, `have_params_sharding`.
- levanter optim soap (7 sites), gated on `merge_small_dims`,
`partition_grads_into_blocks`, `mesh is not None`.
- levanter store jagged_array (2 sites) gated on `self.shapes is not
None`.
- haliax ops (6 sites), gated on `return_index` / `return_inverse`
/ `return_counts`.
- haliax einsum (1 site), letter bound in if branch; else raises.
bad-return:
- haliax state_dict, core, nn/scan: generic T vs concrete runtime
type sites.
- levanter compat/hf_checkpoints: dataclasses.replace loses type
through `# type: ignore`.
- zephyr dataset: sum's Literal[0] widens.
not-iterable:
- haliax nn/conv: replace_axis returns AxisSelection (iterable for
our call).
- levanter grad_accum: R is constrained to a tuple shape at all
call sites.
- levanter models/mixtral: scan return over-widened.
bad-index:
- levanter eval_harness: segment_ids narrowed by None-branch above.
bad-assignment:
- levanter callbacks/watch: list[str] narrowed to list[Target
literals] at runtime.
- levanter models/flash_attention: eliminate_axes returns broad
AxisSelection (x2).
- levanter models/loss: resolve_axis returns AxisSpec for a single
name.
not-callable:
- haliax tree_util (x2): old_is_leaf non-None in else-branch.
- levanter eval_harness: tokenizer is declared a property but
returns a callable HF tokenizer; handle_stop_sequences /
postprocess_generated_text are None only when lm_eval is
unavailable.
unsupported-operation:
- iris cluster controller: COUNT(*) fetchone is non-None.
invalid-inheritance:
- levanter eval_harness: TemplateLM base lacks type stubs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-enable unbound-name, not-iterable, bad-index, bad-context-manager in `[tool.pyrefly.errors]` (removed the four `<cat> = false` lines). All remaining sites in these categories are fixed or carry a per-site ignore from the preceding commits. Baseline: 1551 -> 1203 lines (129 -> 100 suppressed diagnostics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude finished @rjpower's task in 4m 54s —— View job Code review
Findings1 high-signal issue (inline): Validated clean
|
| return { | ||
| "total_requests": len(requests), | ||
| "total_batches": total_batches, | ||
| "completed_batches": batch_idx, | ||
| } |
There was a problem hiding this comment.
The new return dict is missing keys that main() reads, so replay_completions.py will now KeyError instead of returning None (which would have TypeError'd on subscript). Specifically main() reads result["failed_batches"] at line 279, and the early-return branch at lines 190-197 returns failed_batches, total_time, and results as well. Adding the return statement here surfaced (but did not finish fixing) the inconsistency.
| return { | |
| "total_requests": len(requests), | |
| "total_batches": total_batches, | |
| "completed_batches": batch_idx, | |
| } | |
| return { | |
| "total_requests": len(requests), | |
| "total_batches": total_batches, | |
| "completed_batches": batch_idx, | |
| "failed_batches": [], | |
| "total_time": 0, | |
| "results": [], | |
| } |
(The failed_batches/total_time/results values here are placeholders matching the early-return shape — replay_all_with_warmup doesn't currently track per-batch failures or wall-time, so a fuller fix would either (a) make send_batch_requests return a result and aggregate them here, or (b) drop those fields from the early-return and update main() to use .get("failed_batches", []). Either way works; the suggested patch is the minimum to stop main() from KeyError'ing.)
Summary
Cluster C (control-flow / data / annotation) of the pyrefly tightening
plan. Stacked on #4804 (PR-0). Clears 9 cluster C categories to zero
and re-enables 4 of them (the 5 others, plus
invalid-annotationandfriends, were re-enabled in PR-0).
Baseline delta
Categories re-enabled in `[tool.pyrefly.errors]`
Category dispositions
Fixes
into a plain `@contextmanager` (no callers used the decorator form).
and annotation tightening (see commit 9da309b for the full list).
`datakit/download/huggingface.py:314`: the error message referenced
`hf_source_path` which is only bound in the else-branch.
correlated-conditional false positives that the current pyrefly
can't track (rollout RNG gates, pallas return_argmax gates,
optim/kron flax/hax partitioning gates, haliax return_index
gates, etc.).
Annotations fixed vs suppressed
See `logs/pyrefly-tightening/pr1.md` for the full breakdown.
Test plan
🤖 Generated with Claude Code