Description
What happened?
When a column has only nulls, I can't create a DuckDB table or pyarrow table from it.
Here are some reproducible examples:
import ibis
con = ibis.duckdb.connect()
data = [{"col1": 1, "col2": None}, {"col1": 4, "col2": None}]
t = ibis.memtable(data)
con.create_table("test", t)
Result: ParserException: Parser Error: syntax error at or near "NULL"
t.execute()
does work in the above example.
import ibis
con = ibis.duckdb.connect()
data = [{"col1": 1, "col2": None}, {"col1": 4, "col2": None}]
ibis.memtable(data).to_pyarrow()
Result: ArrowNotImplementedError: Unsupported cast from int32 to null using function cast_null
import ibis
import pyarrow as pa
con = ibis.duckdb.connect()
data = [{"col1": 1, "col2": None}, {"col1": 4, "col2": None}]
array = pa.array(data)
pa_table = pa.Table.from_struct_array(array)
con.create_table("test", pa_table)
Result: ParserException: Parser Error: syntax error at or near "NULL"
I am guessing the problem is that PyArrow supports columns having a datatype of null
, which most databases probably don't?
DuckDB apparently converts the null
datatype into int32
:
import duckdb
duckdb.from_arrow(pa_table)
┌───────┬───────┐
│ col1 │ col2 │
│ int64 │ int32 │
├───────┼───────┤
│ 1 │ NULL │
│ 4 │ NULL │
└───────┴───────┘
I have no idea what the best way to handle this is, maybe raising an exception asking the user to specify a schema when a NULL column exists?
A job I run daily suddenly started giving the first error. With the error I got, it took some experimenting to figure out that it was actually caused by this issue. A column in the source data (from some API) that usually has strings and nulls now had only nulls.
What version of ibis are you using?
9.2.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Assignees
Type
Projects
Status
backlog