Description
Is your feature request related to a problem or challenge?
It seems the design of Arrow extension types is nearing consensus and will arrive soon
- Add
ExtensionType
trait andCanonicalExtensionType
enum arrow-rs#5822 - Related to Extension Types / User Defined Types #12644
The extension type information is encoded in an Arrow Field
(doclink link) (which has both a DataType
and the metadata information)
In this world, supporting a user function for a user defined type (e.g. a geometry type) I think would look like
- Creating a user defined function and declaring in the signature that it takes
DataType::Binary
- Implementing the
return_type_from_args
function which would then try to get the user defined type information from the Binary column and verify it was correct
However, since the ReturnTypeInfo
only provides DataType
the the Field
information will not be present and thus UDF writers will not be able to access extension type information
datafusion/datafusion/expr/src/udf.rs
Line 359 in 274e535
Describe the solution you'd like
Since we have not released return_type_from_args
yet (it will be released in DataFusion 45) I would like to try and change the API before release to support user defined types
Describe alternatives you've considered
Specifically, I would like to pass in Field
instead of DataType
in ReturnTypeArgs
So instead of
pub struct ReturnTypeArgs<'a> {
/// The data types of the arguments to the function
pub arg_types: &'a [DataType],
/// ...
pub scalar_arguments: &'a [Option<&'a ScalarValue>],
/// Can argument `i` (ever) null?
pub nullables: &'a [bool],
}
I think it would be better to be
pub struct ReturnTypeArgs<'a> {
/// The schema fields of the arguments. Fields include DataType, nullability and other information.
pub arg_fields: &'a [Field],
/// ...
pub scalar_arguments: &'a [Option<&'a ScalarValue>],
}
Additional context
This was inspired by a comment from @milenkovicm on the DataFusion sync up call yesterday