Replies: 2 comments 5 replies
-
|
Thanks for opening this discussion. It might be interesting to share the skill before deciding if we want to add it to the repo. I'm curious how it differs from the existing contribution guidelines for adding new rules and if we should just update those instead. |
Beta Was this translation helpful? Give feedback.
-
|
I've done an experiment to gauge this. Setup
PlansNo-SKILL Plan
# AIR202 — Explicit `multiple_outputs` for `@task` returning a Mapping
## Context
Apache Airflow's `@task` decorator infers `multiple_outputs` from the return type annotation: if the return type is a `Mapping` subclass, return-value keys become individual XComs; otherwise the entire value is stored under a single XCom. This implicit coupling between *type annotations* and *runtime XCom semantics* surprises Dag authors and is fragile to future changes in the inference logic (see `airflow/task-sdk/src/airflow/sdk/bases/decorator.py:476-496`).
AIR202 flags `@task`-decorated functions that look like they return a mapping but do not explicitly pass `multiple_outputs=...`. The autofix adds `multiple_outputs=True/False` based on the same inference Airflow uses, preserving runtime behavior while making intent explicit.
The rule fires when **either** of these holds AND `multiple_outputs` is not already set on the decorator:
1. The function has a return type **annotation** whose name is in the Mapping family (see Detection below).
2. The function body contains a `return` statement whose value is a **dict literal** (`{"k": v, ...}`) or a **dict comprehension** (`{k: v for ...}`).
## Detection details
### Decorator targets
All `@task.*` forms (per user choice). Reuse `is_airflow_task()` from `crates/ruff_linter/src/rules/airflow/helpers.rs:307-328` — it already matches `@task`, `@task()`, and `@task.<variant>` for both `airflow.decorators.task` and `airflow.sdk.task`. The rule must also skip if any decorator already passes `multiple_outputs` as a keyword.
### Return-annotation match (Mapping family)
Resolve the annotation via `semantic.resolve_qualified_name`. If the annotation is a subscript (`dict[str, int]`), resolve the *value* (`dict`) part. Match any of:
- `["dict"]` (builtin)
- `["typing", "Dict"]`, `["typing_extensions", "Dict"]`
- `["typing", "Mapping"]`, `["typing_extensions", "Mapping"]`
- `["typing", "MutableMapping"]`, `["typing_extensions", "MutableMapping"]`
- `["collections", "abc", "Mapping"]`
- `["collections", "abc", "MutableMapping"]`
- `["collections", "OrderedDict"]`, `["typing", "OrderedDict"]`
### Return-value match
Walk the function body with `ReturnStatementVisitor` (used in `task_branch_as_short_circuit.rs`). A return statement matches when `ret.value` is `Expr::Dict(_)` or `Expr::DictComp(_)`. Do not descend into nested function/class scopes (the visitor already respects this).
### Autofix
Always available, **unsafe**. Determine the value Airflow *would* infer:
- If the function has a return annotation in the Mapping family → `multiple_outputs=True`.
- Otherwise (no annotation, or annotation isn't Mapping family) → `multiple_outputs=False`.
This mirrors `_infer_multiple_outputs` in `airflow/task-sdk/src/airflow/sdk/bases/decorator.py:476-496` and preserves runtime behavior across the fix.
Edit construction:
- `@task` (no call) → replace decorator expression with `@task(multiple_outputs=...)`. Use `Edit::range_replacement` over `decorator.expression.range()`.
- `@task()` (empty call) → insert `multiple_outputs=...` inside the parens. Use `Edit::insertion` at the position just after `(`.
- `@task(retries=3, ...)` → append `, multiple_outputs=...` before the closing `)`. Use `Edit::insertion` just before the call's `)` token (find via `call.arguments.range().end() - 1` or scan for `)`).
- Variant forms (`@task.virtualenv`, etc.) → same three cases, applied to the variant call.
Report range: the decorator that triggers the rule.
Diagnostic: `Fix::unsafe_edit(...)` with title "Add explicit `multiple_outputs=True`" or `"...=False"`.
## Files to create
- `crates/ruff_linter/src/rules/airflow/rules/task_explicit_multiple_outputs.rs` — the rule.
- `crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py` — fixture covering:
- `@task` returning dict literal (no annotation) → `multiple_outputs=False`
- `@task` with `-> dict` annotation, body returns dict → `multiple_outputs=True`
- `@task` with `-> dict[str, int]` annotation → `multiple_outputs=True`
- `@task` with `-> Mapping[str, int]` (import from `typing`) → `multiple_outputs=True`
- `@task` with `-> collections.abc.Mapping` → `multiple_outputs=True`
- `@task()` form with empty parens
- `@task(retries=3)` with existing kwargs
- `@task.virtualenv(...)` variant
- Dict comprehension return
- **Negative**: `@task(multiple_outputs=True)` already specified → no diagnostic
- **Negative**: `@task` returning list/int/str with no Mapping annotation → no diagnostic
- **Negative**: function not decorated with `@task` → no diagnostic
- **Negative**: dict literal inside nested function inside the task → no diagnostic
- The snapshot file is auto-generated by `cargo nextest run` with `INSTA_UPDATE=always`.
## Files to modify
- `crates/ruff_linter/src/rules/airflow/rules/mod.rs` — add `mod task_explicit_multiple_outputs;` and `pub(crate) use task_explicit_multiple_outputs::*;`.
- `crates/ruff_linter/src/codes.rs` (around line 1139, after the AIR201 entry) — add `(Airflow, "202") => rules::airflow::rules::AirflowTaskMissingMultipleOutputs,`.
- `crates/ruff_linter/src/checkers/ast/analyze/statement.rs` (around line 342 where `AirflowTaskBranchAsShortCircuit` is dispatched) — add dispatch on `StmtFunctionDef`:
```rust
if checker.is_rule_enabled(Rule::AirflowTaskMissingMultipleOutputs) {
airflow::rules::task_explicit_multiple_outputs(checker, function_def);
}
```
- `crates/ruff_linter/src/rules/airflow/mod.rs` — add a `#[test_case(Rule::AirflowTaskMissingMultipleOutputs, Path::new("AIR202.py"))]` entry.
## Rule struct (sketch)
```rust
/// ## What it does
/// Checks for `@task`-decorated functions that appear to return a `Mapping`
/// (dict literal, dict comprehension, or an annotated return type in the
/// `Mapping` family) without explicitly specifying `multiple_outputs`.
///
/// ## Why is this bad?
/// Airflow infers `multiple_outputs` from the return type annotation: if the
/// annotation is a `Mapping` subclass, return-value keys are split into
/// individual XComs; otherwise the entire value is stored as one XCom. This
/// implicit coupling between type annotations and XCom semantics is
/// surprising and fragile. Passing `multiple_outputs` explicitly makes the
/// intent clear and decouples the behavior from the annotation.
///
/// ## Example
/// ```python
/// @task
/// def my_task() -> dict:
/// return {"x": 1}
/// ```
///
/// Use instead:
/// ```python
/// @task(multiple_outputs=True)
/// def my_task() -> dict:
/// return {"x": 1}
/// ```
///
/// ## Fix safety
/// The fix is always unsafe: it pins the value Airflow would have inferred
/// today, but if the function's behavior was relying on a different
/// inference (or future inference change) the resulting XCom shape may
/// change.
#[derive(ViolationMetadata)]
#[violation_metadata(preview_since = "0.15.13")] // bump to current preview version
pub(crate) struct AirflowTaskMissingMultipleOutputs {
inferred: bool,
}
```
`Violation` impl returns
`"`multiple_outputs` should be explicitly specified on `@task` returning a `Mapping`"`
and fix title `"Add `multiple_outputs={True|False}`"`.
## Reusable utilities
- `is_airflow_task` — `crates/ruff_linter/src/rules/airflow/helpers.rs:307`
- `ReturnStatementVisitor` — `ruff_python_ast::helpers::ReturnStatementVisitor` (used in `task_branch_as_short_circuit.rs:2`)
- `semantic.resolve_qualified_name` for annotation matching
- `Fix::unsafe_edit` / `Edit::range_replacement` / `Edit::insertion`
## Verification
```sh
# Build & focused test
CARGO_PROFILE_DEV_OPT_LEVEL=1 INSTA_FORCE_PASS=1 INSTA_UPDATE=always \
CARGO_PROFILE_DEV_DEBUG="line-tables-only" MDTEST_UPDATE_SNAPSHOTS=1 \
cargo nextest run -p ruff_linter -- airflow
# Review newly generated snapshot under
# crates/ruff_linter/src/rules/airflow/snapshots/ruff_linter__rules__airflow__tests__AIR202_AIR202.py.snap
# Hand-check on the airflow repo (will surface real-world cases)
cargo run --bin ruff -- check --select AIR202 --preview ../airflow
# Try the autofix on the fixture (a copy)
cp crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py /tmp/AIR202_copy.py
cargo run --bin ruff -- check --select AIR202 --preview --fix --unsafe-fixes /tmp/AIR202_copy.py
diff crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py /tmp/AIR202_copy.py
# Lints/format
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo dev generate-all # regenerates schema + rule docs
uvx prek run -a
```
## Open / non-goals
- Does not attempt to follow type aliases (`MyDict = dict[str, int]`) — only matches annotation names directly.
- Does not detect dict-returning expressions other than dict literals and dict comprehensions (e.g., `return dict(...)`, `return some_var`). These can produce false negatives; deliberate per "common in airflow" framing.
- Does not check `multiple_outputs` passed via `.override(...)` chain — only the decorator itself.
SKILL-assisted Plan
# AIR202: Explicit `multiple_outputs` on `@task`-decorated functions
## Context
Airflow's `@task` decorator infers `multiple_outputs` from the return type annotation at runtime: if the annotation resolves to a subclass of `collections.abc.Mapping`, the function's return value is split into multiple XComs (one per key); otherwise it is stored as a single XCom. The inference happens in `_infer_multiple_outputs` at `task-sdk/src/airflow/sdk/bases/decorator.py:477-496`:
```python
ttype = getattr(return_type, "__origin__", return_type)
return isinstance(ttype, type) and issubclass(ttype, Mapping)
```
This couples *typing* to *runtime XCom behavior* in a non-obvious way. Renaming a type hint, removing it, or switching `dict` → `tuple` silently changes the DAG's XCom layout. AIR202 flags `@task`-decorated functions whose XCom behavior is determined by inference rather than by an explicit `multiple_outputs=...` kwarg, so authors:
1. Make their intent explicit.
2. Insulate their DAGs from future inference-logic changes.
3. Become aware that `multiple_outputs` exists.
## Rule summary
**Code:** `AIR202` (next free in the 2xx range — AIR201 is the last one used).
**Name:** `AirflowTaskMultipleOutputsImplicit` (reads "allow airflow task multiple outputs implicit").
**Category:** General best-practice (AIR0xx/2xx), targets both Airflow 2 and 3 (`airflow.decorators.task` and `airflow.sdk.task`).
**Fix availability:** Always available, **unsafe**.
### Trigger
A `FunctionDef` is flagged when **all** of the following hold:
1. It is decorated with `@task` or one of its supported variants (see "Decorator matching" below) — but *not* `@task.sensor`, which hardcodes `multiple_outputs=False`.
2. The decorator does **not** already pass `multiple_outputs` (in either `@task(multiple_outputs=...)` form or any of the variant subscripts).
3. At least one of:
- **Annotation path:** The return type annotation resolves to a subclass of `collections.abc.Mapping` (e.g., `dict`, `dict[str, int]`, `typing.Dict`, `typing.Mapping`, `collections.abc.Mapping`, or a `TypedDict` subclass).
- **Body path:** Some return statement in the function body returns an inline dict literal (`return {...}`), a dict comprehension (`return {k: v for ...}`), or a `dict(...)` call.
### Autofix (unsafe)
Insert a `multiple_outputs=` keyword argument into the decorator, mirroring Airflow's runtime inference:
- If the annotation path fires (Mapping-subclass annotation) → add `multiple_outputs=True`.
- Otherwise (body path only) → add `multiple_outputs=False` (matches what Airflow would have inferred at runtime; preserves current behavior).
Two decorator forms need to be handled:
- `@task(...)` (already a `Call`): use `crate::fix::edits::add_argument` (`crates/ruff_linter/src/fix/edits.rs:274`).
- `@task` / `@task.branch` etc. (bare `Name` or `Attribute`, no call): insert `(multiple_outputs=True/False)` at the end of the decorator expression's range via `Edit::insertion`.
The fix is **always unsafe** because (a) the user may rely on inference behavior elsewhere, and (b) we cannot statically prove the dict-return pattern matches *all* return paths (e.g., conditional returns of non-dicts).
### Decorator matching
Use the existing `is_airflow_task_variant` helper in `crates/ruff_linter/src/rules/airflow/helpers.rs` to check variants. The rule should iterate the function's decorators and accept any decorator that matches one of:
- `airflow.decorators.task` / `airflow.sdk.task` (bare or called)
- `task.python`, `task.virtualenv`, `task.external_python`, `task.branch`, `task.branch_virtualenv`, `task.branch_external_python`, `task.short_circuit`, `task.docker`, `task.kubernetes`, `task.pyspark` (attribute access on `task`)
Explicitly exclude `task.sensor`.
If `is_airflow_task_variant` doesn't already cover the multi-variant case, extend `helpers.rs` with a small `is_airflow_task_supporting_multiple_outputs(decorator, semantic) -> Option<&Decorator>` helper that returns the matched decorator (so the fix knows where to insert). Place it near `is_airflow_task` for discoverability.
### Mapping-annotation detection
Helper logic (private to the rule file):
1. Unwrap subscripts: for `dict[str, int]` look at the `value` (`dict`).
2. Resolve qualified name via `checker.semantic().resolve_qualified_name(...)`.
3. Return `true` if segments match any of:
- `["", "dict"]` (builtin — check via `semantic().match_builtin_expr(expr, "dict")`).
- `["typing", "Dict"]`, `["typing", "Mapping"]`, `["typing", "MutableMapping"]`, `["typing", "OrderedDict"]`, `["typing", "DefaultDict"]`, `["typing", "Counter"]`, `["typing", "ChainMap"]`, `["typing", "TypedDict"]` and `typing_extensions` equivalents — use `semantic().match_typing_expr(...)`.
- `["collections", "abc", "Mapping" | "MutableMapping"]`.
- `["collections", "OrderedDict" | "defaultdict" | "Counter" | "ChainMap"]`.
4. For class names (e.g., a user-defined `class MyDict(TypedDict): ...`), best-effort detection only — if the annotation is a local class whose bases include a `TypedDict` parent reachable via the semantic model, treat it as Mapping. If detection is uncertain, *do not* flag (avoids false positives). The body-path check still catches the common case.
### Body-return detection
Use `ReturnStatementVisitor` from `ruff_python_ast::helpers` to collect all `return` statements (handles nested ifs). A function matches the body path if **any** return value is:
- `Expr::Dict(_)` — inline `{...}` literal.
- `Expr::DictComp(_)` — dict comprehension.
- `Expr::Call(c)` where `c.func` resolves to the builtin `dict` (`semantic().match_builtin_expr(&c.func, "dict")`).
## Files to create / modify
### New
- `crates/ruff_linter/src/rules/airflow/rules/task_multiple_outputs_implicit.rs` — rule implementation.
- `crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py` — test fixture with positive/negative cases (see "Fixture cases" below).
### Modified
- `crates/ruff_linter/src/codes.rs` — register `(Airflow, "202") => rules::airflow::rules::AirflowTaskMultipleOutputsImplicit` under the airflow section.
- `crates/ruff_linter/src/rules/airflow/rules/mod.rs` — add `pub(crate) use task_multiple_outputs_implicit::*;` and `mod task_multiple_outputs_implicit;` (alphabetical).
- `crates/ruff_linter/src/rules/airflow/mod.rs` — add `#[test_case(Rule::AirflowTaskMultipleOutputsImplicit, Path::new("AIR202.py"))]`.
- `crates/ruff_linter/src/checkers/ast/analyze/statement.rs` — dispatch on `Stmt::FunctionDef(function_def)` near the existing `AirflowTaskBranchAsShortCircuit` dispatch:
```rust
if checker.is_rule_enabled(Rule::AirflowTaskMultipleOutputsImplicit) {
airflow::rules::task_multiple_outputs_implicit(checker, function_def);
}
```
- `crates/ruff_linter/src/rules/airflow/helpers.rs` — only if a new shared helper is needed for decorator matching (see "Decorator matching"); otherwise leave untouched.
### Reused utilities (no duplication)
- `crate::rules::airflow::helpers::is_airflow_task_variant` — variant detection.
- `crate::fix::edits::add_argument` — kwarg insertion for `@task(...)` calls (`crates/ruff_linter/src/fix/edits.rs:274`).
- `ruff_python_ast::helpers::{ReturnStatementVisitor, map_callable}` — body traversal and decorator unwrapping.
- `ruff_python_semantic::SemanticModel::{resolve_qualified_name, match_builtin_expr, match_typing_expr}` — type resolution.
## Diagnostic
- **Message (generic):** `` `@task`-decorated function relies on `multiple_outputs` inference ``.
- **Fix title (variant-specific):**
- `` Add `multiple_outputs=True` `` (annotation path)
- `` Add `multiple_outputs=False` `` (body-only path)
Use a `Kind` enum on the struct so the fix title can branch on the path that fired.
## Fixture cases (AIR202.py)
Positive (should flag):
- `@task` + `-> dict:` returning dict literal (annotation path; fix: True).
- `@task()` + `-> Mapping[str, int]:` (annotation path; fix: True).
- `@task` + no annotation + `return {"x": 1}` (body path; fix: False).
- `@task` + no annotation + dict comprehension return (body path; fix: False).
- `@task` + no annotation + `return dict(a=1)` (body path; fix: False).
- `@task.branch`, `@task.short_circuit`, `@task.virtualenv`, `@task.docker`, `@task.kubernetes` each with a dict literal return.
- Both `from airflow.decorators import task` and `from airflow.sdk import task` import forms.
- `class MyTD(TypedDict): ...` annotation (annotation path; fix: True).
Negative (should NOT flag):
- `@task(multiple_outputs=True)` and `@task(multiple_outputs=False)` — already explicit.
- `@task` returning a non-dict (e.g., `return 1`, `return [1,2]`, `return (1,2)`) with no Mapping annotation.
- `@task` with `-> list[int]:` annotation.
- `@task.sensor` returning a dict (sensor forbids `multiple_outputs`).
- Plain function (no `@task` decorator) returning a dict.
- `@task` returning `None` / no return statements.
## Verification
1. `cargo fmt -p ruff_linter`
2. Manual check:
```sh
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py --no-cache --preview --select AIR202
cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/airflow/AIR202.py --no-cache --preview --select AIR202 --fix --unsafe-fixes --diff
```
3. Snapshot test:
```sh
RUFF_UPDATE_SCHEMA=1 cargo nextest run -p ruff_linter -- "airflow::tests"
cargo insta accept
RUFF_UPDATE_SCHEMA=1 cargo nextest run -p ruff_linter -- "airflow::tests"
```
4. `cargo dev generate-all` to regenerate docs/schema.
5. `cargo clippy --workspace --all-targets --all-features -- -D warnings`.
6. `uvx prek run -a`.
7. Validate against the local Airflow clone for false positives:
```sh
cargo run -p ruff -- check /home/iliya/repositories/airflow --no-cache --preview --select AIR202
```
Review hits; if a false-positive pattern surfaces, add it as a negative case in the fixture and tighten the rule.
ImplementationsThe below are the "raw" outputs for when the agent thinks it's "done".
ComparisonManualMy personal impression is that with the SKILL, the exploration of the Airflow codebase was more thorough, and the questions shown during the planning stage were more focused. Automated (Codex)Important Mango is the no-SKILL plan and Goblet is the SKILL-assisted one.
Comparison of Plans
Direct answerGoblet is the stronger plan overall. It is more faithful to the prompt, more complete on Ruff integration, better on test coverage, and more careful about preserving Airflow’s current runtime behavior in the autofix. I would use Goblet as the base implementation plan, with a few corrections. Overall scores
Key differences1. Trigger coverageGoblet matches the prompt more completely. It says the rule should fire for Mango omits Winner: Goblet. 2. Autofix behaviorBoth plans understand the core autofix rule correctly: add Goblet also recommends using Ruff’s existing Winner: Goblet. 3. Decorator matchingGoblet is more careful here. It explicitly includes core Mango says “all Winner: Goblet. 4. Mapping annotation detectionGoblet is much broader: it covers built-in However, Goblet may overreach on Winner: Goblet, with a caution on 5. Ruff integration detailsGoblet gives a cleaner and more internally consistent integration plan: rule file, fixture, Mango’s integration is serviceable, but the naming is less aligned with the rule’s purpose. Winner: Goblet. 6. Test coverageGoblet’s fixture plan is stronger. It includes annotation positives, body positives, Mango includes useful cases too, especially decorator-call forms like Winner: Goblet, though Mango has a good nested-function negative case that should be added to Goblet. Main issues in each planGoblet issues to fix
Mango issues to fix
RecommendationUse Goblet as the base plan, then incorporate these Mango improvements:
Bottom line: Goblet is more implementation-ready and more aligned with the requested AIR202 semantics. Mango is a decent simpler plan, but it leaves avoidable false negatives and has a more fragile autofix strategy. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I recently worked on several AIR rules. While doing so, I developed a SKILL for creating additional rules of this family, incorporating feedback and best practices. My question is: would this be something worthwhile to add to the Ruff repo itself (under
.claude/skills/)?I imagine other developers have similar SKILLs they use - in which case it would make a lot of sense to consolidate them in a centralized location. This has the potential of making everyone's lives easier since rule additions/amendments would presumably start off at a higher baseline that's aligned with the repo's best practices.
Regarding the maintenance of such SKILLs, I'd add a note to the contribution guidelines or even PR template, along the lines of "If a dedicated SKILL was used to create this rule, please consider asking your LLM to incorporate any PR feedback into the SKILL, so that insights can be reuse by others".
Beta Was this translation helpful? Give feedback.
All reactions