Description
What happened?
When converting an Ibis schema containing a timezone-less Timestamp column (i.e. dt.Timestamp
with timezone
set to None
) via a Databricks backend, the column is mapped to the sqlglot TIMESTAMP
(defined in _from_ibis_Timestamp
in the SqlglotType
class). However, in Databricks, TIMESTAMP
corresponds to the timestamp type with timezone information (https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-type). The correct type to convert to should be TIMESTAMP_NTZ
(https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-ntz-type).
Example:
>>> conn = ibis.databricks.connect(...)
>>> schema = ibis.schema([('ts_col', 'timestamp')])
>>> schema['ts_col']
Timestamp(timezone=None, scale=None, nullable=True)
>>> conn.create_table('ts_col_table', schema=schema)
DatabaseTable: catalog_test.default.ts_col_table
ts_col timestamp('UTC')
>>> read_schema = conn.table('ts_col_table').schema()
>>> read_schema
ibis.Schema {
ts_col timestamp('UTC')
}
As a workaround, I was able to monkey patch this behavior locally by implementing a custom DatabricksTypeMapper with a def _from_ibis_Timestamp
method that returns sge.DataType(this=sge.DataType.Type.TIMESTAMPNTZ)
when the input timestamp's timezone
is None
and wiring this TypeMapper into the DatabricksCompiler:
class DatabricksTypeMapper(PySparkType):
@classmethod
def _from_ibis_Timestamp(cls, dtype: dt.Timestamp) -> sge.DataType:
code = sge.DataType.Type.TIMESTAMPNTZ if dtype.timezone is None else sge.DataType.Type.TIMESTAMP
if dtype.scale is not None:
scale = sge.DataTypeParam(this=sge.Literal.number(dtype.scale))
return sge.DataType(this=code, expressions=[scale])
else:
return sge.DataType(this=code)
DatabricksCompiler.type_mapper = DatabricksTypeMapper
ibis.backends.sql.compilers.databricks.compiler = DatabricksCompiler()
ibis.backends.sql.datatypes.TYPE_MAPPERS[ibis.backends.sql.compilers.databricks.compiler.dialect] = DatabricksTypeMapper
What version of ibis are you using?
10.3.1
What backend(s) are you using, if any?
Databricks
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status