Skip to content

Create convenience decorator as_step_fn for building StepSpecs#5157

Open
wmoss wants to merge 1 commit intomainfrom
wmoss/create-as_step_fn-decorator
Open

Create convenience decorator as_step_fn for building StepSpecs#5157
wmoss wants to merge 1 commit intomainfrom
wmoss/create-as_step_fn-decorator

Conversation

@wmoss
Copy link
Copy Markdown
Collaborator

@wmoss wmoss commented Apr 24, 2026

This allows for decorating a function that you want to use in a StepSpec and avoiding the lambda op: ... boilerplate.

Curious if people think this is useful enough to add.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 24, 2026

Claude finished @wmoss's task in 4m 55s —— View job


Code review

  • Check PR status and whether review is needed
  • Locate relevant CLAUDE.md / AGENTS.md files
  • Summarize PR changes
  • Run 4 parallel review agents (CLAUDE.md ×2 + bug ×2)
  • Validate flagged issues
  • Post review comments

Two high-signal issues found. Inline comments posted:

  1. as_step_fn docstring/behavior mismatch (step_spec.py:18-29) — the decorator always returns a closure, so the docstring's "called normally with output_path" branch does not exist. Also, if a caller passes output_path= to the wrapper, inner(op) later raises TypeError: got multiple values for keyword argument 'output_path' at execution time.

  2. @as_step_fn on dedup_exact_paragraph silently breaks existing callers (exact.py:50)tests/processing/classification/deduplication/test_exact.py:11 and tests/processing/classification/test_consolidate.py:29 call the function directly and use result["success"]; these will fail. tests/integration_test.py:90 and experiments/dedup/poc_nemotron.py:50,73 still use the fn=lambda op: dedup_exact_paragraph(..., output_path=op, ...) pattern, which becomes a silent no-op (or a deferred TypeError) under the new decorator.

Recommendation on the PR question ("is this useful enough to add?"): yes, once the decorator is fixed to be a pure curry (reject output_path in kwargs) and all four remaining call sites are migrated. I'd also consider renaming to make the currying explicit.
· Branch: wmoss/create-as_step_fn-decorator

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 536aac8c88

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/marin/src/marin/execution/step_spec.py Outdated
This allows for decorating a function that you want to use in a `StepSpec` and avoiding the `lambda op: ...` boilerplate.
@wmoss wmoss force-pushed the wmoss/create-as_step_fn-decorator branch from 536aac8 to 45d6bc4 Compare April 24, 2026 20:42
Comment thread lib/marin/src/marin/execution/step_spec.py
@wmoss
Copy link
Copy Markdown
Collaborator Author

wmoss commented Apr 24, 2026

@claude please review again, I fixed the bug that caused all your original concerns.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 24, 2026

Claude finished @wmoss's task in 1m 44s —— View job


Re-review after fix

  • Pull latest and inspect current state of the PR
  • Check as_step_fn decorator fix in step_spec.py
  • Check dedup_exact_paragraph call sites for regressions
  • Validate with a quick reasoning pass
  • Post updated review

Verdict: fix looks correct; both original concerns are resolved.

What changed

step_spec.py:18-32 now branches on whether output_path is in kwargs:

def wrapper(*args, **kwargs):
    if "output_path" not in kwargs:
        def inner(output_path):
            return fn(*args, output_path=output_path, **kwargs)
        return inner
    return fn(*args, **kwargs)

That matches the docstring — a normal call with output_path executes fn immediately; omitting it returns the curried closure. And because the currying branch is taken only when output_path is absent from kwargs, the TypeError: got multiple values for keyword argument 'output_path' failure mode from the previous iteration can't occur.

Call-site audit

Walked every caller of dedup_exact_paragraph:

Site Pattern Branch taken
test_exact.py:11 dedup_exact_paragraph(..., output_path=..., ...) direct — result["success"] works
test_consolidate.py:29 direct kwarg call direct — same
integration_test.py:90 lambda output_path: dedup_exact_paragraph(..., output_path=output_path, ...) direct (executor invokes the lambda)
poc_nemotron.py:50,73 lambda op: dedup_exact_paragraph(..., output_path=op, ...) direct — same
fineweb_10bt_exact.py:36 fn=dedup_exact_paragraph(input_paths=..., max_parallelism=...) curried — returns inner

Since dedup_exact_paragraph declares output_path keyword-only (*, at line 52), there's no positional-vs-keyword ambiguity to worry about.

Minor nits (non-blocking)

  • The doc comment would be a smidge clearer if it named the return types explicitly — something like "returns the result if output_path is given, otherwise returns a Callable[[str], R]". Not worth another round-trip.
  • Because this is a load-bearing convenience now, a tiny unit test for the decorator itself (two cases: direct and curried) would pay for itself the next time someone tweaks it.

Happy with merging as-is otherwise.
· Branch: wmoss/create-as_step_fn-decorator

@ravwojdyla
Copy link
Copy Markdown
Contributor

@wmoss this is nice, but it introduces magic. I find magic is not good for humans, nor agents. Are we sure this kind of magic is worth it?

Copy link
Copy Markdown
Collaborator Author

@wmoss wmoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wmoss this is nice, but it introduces magic. I find magic is not good for humans, nor agents. Are we sure this kind of magic is worth it?

Yeah, this is basically the question I am asking by putting up the PR. Does the usefulness out-weight the magic? (I also just added a slightly less magical, but also less ergonomic version in a comment).

FWIW, decorators are a fairly normal form of Python magic. For example, click, which we use for clis, uses them extensively.

I can certainly see the arguments in favor of either choice though.

return fn(*args, output_path=output_path, **kwargs)

return inner
return fn(*args, **kwargs)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could be simplified to,

    @wraps(fn)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        def inner(output_path: str) -> Any:
            return fn(*args, output_path=output_path, **kwargs)

        return inner

which makes it less weird / magical IMO, but does mean that you can no longer call this function like a normal function and so tests would need to change to look like,

    result = dedup_exact_paragraph(
        input_paths=fox_corpus["test_dir"],
        max_parallelism=4,
    )(output_path=fox_corpus["output_dir"])

Copy link
Copy Markdown
Contributor

@ravwojdyla ravwojdyla Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I'm not sure if I like the less weird/magical version. we probably should go either full magic or little-magic, IMHO I like little-magic, especially in the world where the agents write the code (and I'm just a reader). wdyt?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I'm certainly reading more code as I'm trying to learn how things work than I will be in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants