feat: adds planning time validation of udf function signature #5470

universalmind303 · 2025-10-30T19:34:46Z

Changes Made

UDF's are our most expensive operation to run, so we want to make sure that we're not wasting resources running functions if we can know in advance that the types are wrong and it would likely fail at runtime.

This PR adds planning time validation on rowwise python udf function signatures. This is consistent with our built in expressions. If the dtype is wrong, we error during planning, preventing potentially costly runtime errors or confusing results

example:

@daft.func
def greeting(s: str) -> str:
  return f"hello {s}"

previously this would work with any datatype, which could produce confusing results, as the s: str was not actually enforced.

This is opt in as we only check if there are function signatures. If a user does not provide typing, then we don't do anything

as such, this would continue to work for all datatypes.

@daft.func
def greeting(s) -> str:
  return f"hello {s}"

Related Issues

Closes #5462

Checklist

Documented in API Docs (if applicable)
Documented in User Guide (if applicable)
If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
Documentation builds and is formatted properly

greptile-apps

Greptile Overview

Greptile Summary

Adds planning-time type validation for rowwise Python UDFs by extracting type hints from function signatures and validating them against actual expression types in Rust.

Key changes:

Extracts input parameter types from function signatures using inspect.signature() and get_type_hints()
Passes type information through the Python/Rust boundary to the planning phase
Validates expected vs actual types in PyScalarFn::to_field(), producing clear error messages like "Expects input to 'func_name' to be Float64, but received Int64"
Treats DataType::Python as a wildcard (used when no type hint provided or type is Any)
Applies to both @daft.func and @daft.cls decorated functions

Implementation:

Python side collects dtypes per parameter in _get_input_dtypes() and tracks them in __call__()
Rust side validates during logical plan construction in conditional compilation block (only when python feature enabled)
Proto schema extended to serialize/deserialize input types

Test coverage: Comprehensive tests covering positional args, kwargs, keyword-only args, defaults, and both sync/async variants

Confidence Score: 4/5

PR is safe to merge with one minor edge case consideration around mixed Expression/non-Expression positional arguments
Solid implementation with comprehensive tests. The Rust validation logic is correct, proto serialization is proper, and test coverage is thorough. One potential edge case exists with mixed positional args (Expression + literal), but this appears to be an existing design constraint rather than a new bug introduced by this PR
daft/udf/udf_v2.py - review the iterator-based dtype matching for positional args (lines 235-240)

Important Files Changed

File Analysis

Filename	Score	Overview
daft/udf/udf_v2.py	3/5	Adds `_get_input_dtypes` method and tracks input types, but iterator-based positional arg matching could misalign types if mixing Expression and non-Expression positional args
src/daft-dsl/src/python_udf/mod.rs	5/5	Implements type validation at planning time, correctly validates expected vs actual types with proper Python type handling
tests/udf/test_row_wise_udf.py	5/5	Comprehensive tests for sync and async rowwise UDF type validation with various argument patterns
tests/udf/test_cls.py	5/5	Comprehensive tests for class-based UDF type validation with various argument patterns

Sequence Diagram

sequenceDiagram
    participant User
    participant Func as Func.__call__()
    participant Python as Python (udf_v2.py)
    participant Rust as Rust (python_udf/mod.rs)
    participant Schema

    User->>Func: @daft.func decorated function(expr1, expr2)
    
    Note over Func: Initialization Phase
    Func->>Python: _get_input_dtypes(fn)
    Python->>Python: inspect.signature(fn)
    Python->>Python: get_type_hints(fn)
    Python-->>Func: input_dtypes dict {param: DataType}
    
    Note over Func: Call Phase
    User->>Func: func(df["col1"], df["col2"])
    Func->>Func: Extract Expression args
    Func->>Func: Build input_dtypes list from dict
    Func->>Rust: row_wise_udf(name, cls, method, ..., input_dtypes)
    
    Note over Rust: Planning Phase (to_field)
    Rust->>Rust: PyScalarFn::to_field(schema)
    Rust->>Schema: Get actual types from args
    Schema-->>Rust: actual_inputs: Vec<DataType>
    Rust->>Rust: Validate len(expected) == len(actual)
    loop For each (expected, actual) pair
        alt expected != Python type
            Rust->>Rust: Validate expected == actual
            alt Types mismatch
                Rust-->>User: TypeError: Expects input to 'func' to be X, but received Y
            end
        end
    end
    Rust-->>Func: Field with validated type
    Func-->>User: Expression (validated)

_{10 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

daft/udf/udf_v2.py

codecov · 2025-10-30T21:15:12Z

Codecov Report

❌ Patch coverage is 81.03448% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.66%. Comparing base (158a104) to head (56caa39).

Files with missing lines	Patch %	Lines
src/daft-dsl/src/python_udf/mod.rs	62.06%	11 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5470      +/-   ##
==========================================
- Coverage   71.66%   71.66%   -0.01%     
==========================================
  Files         998      998              
  Lines      127368   127423      +55     
==========================================
+ Hits        91279    91315      +36     
- Misses      36089    36108      +19

Files with missing lines	Coverage Δ
daft/udf/udf_v2.py	`96.23% <100.00%> (+0.42%)`	⬆️
src/daft-dsl/src/python.rs	`81.83% <100.00%> (+0.22%)`	⬆️
src/daft-dsl/src/python_udf/row_wise.rs	`92.01% <100.00%> (+0.11%)`	⬆️
src/daft-dsl/src/python_udf/mod.rs	`71.79% <62.06%> (-5.76%)`	⬇️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

universalmind303 added 2 commits October 30, 2025 12:15

perf: move common datatypes before less common ones like jax/numpy/torch

f834a4e

feat: add planning time validation on rowwise function signatures

18e532e

universalmind303 requested a review from kevinzwang October 30, 2025 19:34

greptile-apps bot reviewed Oct 30, 2025

View reviewed changes

daft/udf/udf_v2.py Show resolved Hide resolved

Base automatically changed from cory/dtype-infer-performance to main October 30, 2025 20:22

github-actions bot added the feat label Oct 30, 2025

Merge branch 'main' into cory/udf-func-sig

56caa39

universalmind303 and others added 2 commits November 3, 2025 10:35

Merge branch 'main' into cory/udf-func-sig

0f1c310

Merge branch 'main' into cory/udf-func-sig

e649a32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: adds planning time validation of udf function signature #5470

feat: adds planning time validation of udf function signature #5470

universalmind303 commented Oct 30, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

codecov bot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: adds planning time validation of udf function signature #5470

Are you sure you want to change the base?

feat: adds planning time validation of udf function signature #5470

Conversation

universalmind303 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Related Issues

Checklist

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

codecov bot commented Oct 30, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

universalmind303 commented Oct 30, 2025 •

edited

Loading