Skip to content

Commit 5539d06

Browse files
jmhsiehclaude
andcommitted
Document struct outputs for UDFs
Add a "Struct outputs" section to the UDFs guide showing how to return multiple related values from a single UDF using pa.struct as the data_type, illustrated with the image dimensions example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7a753b7 commit 5539d06

1 file changed

Lines changed: 24 additions & 0 deletions

File tree

docs/geneva/udfs/udfs.mdx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,30 @@ def recordbatch_filename_len(batch: pa.RecordBatch) -> pa.Array:
5959

6060
> **Note**: Batch UDFS require you to specify `data_type` in the ``@udf`` decorator for batched UDFs which defines `pyarrow.DataType` of the returned `pyarrow.Array`.
6161
62+
### Struct outputs
63+
64+
A UDF can return multiple related values as a single `struct` column by setting `data_type` to a `pa.struct(...)` and returning a tuple (matched by field order) or a `dict` keyed by field name.
65+
66+
```python
67+
import io
68+
import pyarrow as pa
69+
from geneva import udf
70+
71+
@udf(
72+
data_type=pa.struct(
73+
[pa.field("width", pa.int32()), pa.field("height", pa.int32())]
74+
),
75+
)
76+
def dimensions(image: bytes) -> tuple[int, int]:
77+
"""Extract image dimensions (width, height)."""
78+
from PIL import Image
79+
80+
img = Image.open(io.BytesIO(image))
81+
return img.size
82+
```
83+
84+
Downstream UDFs can then read individual fields via dot notation in `input_columns` (see below).
85+
6286
### Struct fields and list inputs
6387

6488
You can pass nested `struct` fields directly into a UDF by specifying `input_columns` with dot notation. For list-typed inputs, Geneva can pass a NumPy array when the argument is annotated as `np.ndarray` (use `np.ndarray | None` for nullable lists).

0 commit comments

Comments
 (0)