Skip to content

Conversation

@Jay-ju
Copy link
Contributor

@Jay-ju Jay-ju commented Jan 8, 2026

Changes Made

This PR adds support for passing ray_options and overriding resource
requirements in Daft UDF v2 (@daft.func and @daft.cls).

Key changes:

  • Update row_wise_udf and batch_udf bindings to accept ray_options and cpus.
  • Add override_options and with_concurrency methods to Func class
    in daft/udf/udf_v2.py for dynamic configuration.
  • Propagate these options from Python to the Rust Logical Plan.
  • Update type stubs (.pyi) to match new Rust signatures.

Related Issues

@github-actions github-actions bot added the feat label Jan 8, 2026
@Jay-ju Jay-ju force-pushed the feature/udf-ray-options branch from a04abeb to a0b146d Compare January 8, 2026 13:39
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds ray_options support and resource override capabilities to UDF v2, along with a new Lance update_columns feature.

Key Changes:

UDF v2 Enhancements:

  • Added ray_options parameter to @daft.func, @daft.func.batch, and @daft.cls decorators to pass Ray-specific options (e.g., scheduling_strategy, custom resources)
  • Added cpus and memory_bytes fields to Func dataclass and Rust UDF structs (RowWisePyFn, BatchPyFn) to support explicit CPU and memory resource requests
  • Implemented override_options() method to allow runtime modification of UDF resource requirements (CPUs, GPUs, memory, concurrency, and Ray options)
  • Implemented with_concurrency() convenience method for overriding max_concurrency
  • Updated resource extraction logic in __call__ to prioritize num_cpus and memory from ray_options if present
  • Updated Rust UDFProperties extraction to use new resource fields

Lance Integration:

  • Added new update_columns API for row-level column updates in Lance datasets (distinct from merge_columns which adds new columns)
  • Implemented lance_update_column.py module with GroupFragmentUpdateUDF for fragment-level processing
  • Added comprehensive tests for Lance update functionality with schema evolution scenarios

Issues Found:

  • Redundant num_gpus check in override_options method (line 362-363) - already handled at line 346-347

Confidence Score: 4/5

  • This PR is safe to merge with one minor logic issue
  • The implementation is well-structured and comprehensive with thorough test coverage. One redundant check was found in the override_options method (checking num_gpus twice), which doesn't cause incorrect behavior but is unnecessary code. The changes are backward compatible, maintain consistency across Python and Rust layers, and include proper documentation and tests
  • daft/udf/udf_v2.py - contains redundant num_gpus check in override_options method

Important Files Changed

File Analysis

Filename Score Overview
daft/udf/udf_v2.py 5/5 Added ray_options, cpus, and memory_bytes parameters to Func dataclass and override_options/with_concurrency methods for resource customization
daft/udf/init.py 5/5 Added ray_options parameter to @daft.func, @daft.func.batch, and @daft.cls decorators with proper documentation
src/daft-dsl/src/python_udf/batch.rs 5/5 Added cpus, memory_bytes, and ray_options fields to BatchPyFn struct to support resource overrides
src/daft-dsl/src/python_udf/row_wise.rs 5/5 Added cpus, memory_bytes, and ray_options fields to RowWisePyFn struct to support resource overrides
src/daft-dsl/src/functions/python/mod.rs 5/5 Updated UDFProperties extraction to use cpus, memory_bytes, and ray_options from UDF structs
src/daft-dsl/src/python.rs 5/5 Added cpus, memory_bytes, and ray_options parameters to row_wise_udf and batch_udf Python bindings
daft/io/lance/lance_update_column.py 5/5 New file implementing Lance column update functionality using UDF framework with fragment-level processing
daft/io/lance/_lance.py 5/5 Added update_columns public API function for row-level column updates in Lance datasets

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant Decorator as @daft.func/@daft.cls
    participant Func as Func (Python)
    participant PyBinding as Python Bindings
    participant Rust as Rust UDF Structs
    participant Executor as Ray Executor
    
    User->>Decorator: @daft.func(ray_options={...})
    Decorator->>Func: Create Func with ray_options, cpus, memory_bytes
    
    User->>Func: my_udf(df["col"])
    Func->>Func: Extract cpus/memory from ray_options
    Note over Func: if ray_options contains<br/>'num_cpus' or 'memory',<br/>override cpus/memory_bytes
    
    Func->>PyBinding: row_wise_udf() or batch_udf()<br/>with cpus, memory_bytes, ray_options
    PyBinding->>Rust: Create RowWisePyFn/BatchPyFn<br/>with resource parameters
    
    Rust->>Rust: Store cpus, memory_bytes,<br/>ray_options in struct
    
    Note over User,Func: Resource Override Flow
    User->>Func: my_udf.override_options(<br/>num_cpus=2, ray_options={...})
    Func->>Func: Create new Func with<br/>updated resources
    Func->>User: Return new Func instance
    
    Note over Rust,Executor: Execution Phase
    Rust->>Rust: Extract UDFProperties<br/>from RowWisePyFn/BatchPyFn
    Rust->>Executor: Pass ray_options, cpus,<br/>memory_bytes to Ray
    Executor->>Executor: Allocate resources<br/>based on options
Loading

Comment on lines +362 to +347
if num_gpus is not None:
new_ray_options["num_gpus"] = num_gpus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant check - num_gpus is already added to new_ray_options when it's not None (first checked at line 346)

Suggested change
if num_gpus is not None:
new_ray_options["num_gpus"] = num_gpus
if new_ray_options:

This PR adds support for passing `ray_options` and overriding resource
requirements in Daft UDF v2 (`@daft.func` and `@daft.cls`).

Key changes:
- Update `row_wise_udf` and `batch_udf` bindings to accept `ray_options`,
  `memory_bytes`, and `cpus`.
- Add `override_options` and `with_concurrency` methods to `Func` class
  in `daft/udf/udf_v2.py` for dynamic configuration.
- Propagate these options from Python to the Rust Logical Plan.
- Update type stubs (`.pyi`) to match new Rust signatures.
- Add integration tests verifying both `explain()` output and execution results.

Verified with `pre-commit` and new test cases.
@Jay-ju Jay-ju force-pushed the feature/udf-ray-options branch from a0b146d to 2608a75 Compare January 8, 2026 14:09
@Jay-ju
Copy link
Contributor Author

Jay-ju commented Jan 8, 2026

@kevinzwang Please take a look at the content of this PR. Thank you.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 91.46341% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.69%. Comparing base (fb45faf) to head (2608a75).

Files with missing lines Patch % Lines
src/daft-dsl/src/python.rs 82.35% 3 Missing ⚠️
daft/udf/udf_v2.py 92.85% 2 Missing ⚠️
src/daft-dsl/src/functions/python/mod.rs 91.30% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5982      +/-   ##
==========================================
+ Coverage   72.63%   72.69%   +0.05%     
==========================================
  Files         970      970              
  Lines      126562   126597      +35     
==========================================
+ Hits        91924    92025     +101     
+ Misses      34638    34572      -66     
Files with missing lines Coverage Δ
daft/udf/__init__.py 97.36% <100.00%> (ø)
src/daft-dsl/src/expr/mod.rs 74.47% <ø> (ø)
src/daft-dsl/src/python_udf/batch.rs 87.64% <100.00%> (+0.45%) ⬆️
src/daft-dsl/src/python_udf/row_wise.rs 59.93% <100.00%> (+0.76%) ⬆️
daft/udf/udf_v2.py 94.17% <92.85%> (-0.30%) ⬇️
src/daft-dsl/src/functions/python/mod.rs 87.83% <91.30%> (+1.84%) ⬆️
src/daft-dsl/src/python.rs 78.15% <82.35%> (-0.27%) ⬇️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kevinzwang kevinzwang self-requested a review January 8, 2026 21:30
@kevinzwang
Copy link
Member

Hi @Jay-ju could you resolve the merge conflicts? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants