-
Couldn't load subscription status.
- Fork 207
Open
Description
https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-ntz-type
Timestamp_NTZ is a data type that can be used for partitioning, so according to the converter function, it should be supported, but the elif branch for it is missing. To be precise, I actually think the current timestamp mapping should be timestamp_ntz, as the pd.Timestamp constructed doesnt get passed any tz_info.
# converter.py
def to_converter(schema_type) -> Callable[[str], Any]:
"""
For types that support partitioning, a lambda to parse data into the
corresponding type is returned. For data types that cannot be partitioned
on, we return None. The caller is expected to check if the value is None before using.
:param schema_type: str or json representing a data type
:return: converter function or None
"""
if schema_type == "boolean":
return lambda x: None if (x is None or x == "") else (x is True or x == "true")
elif schema_type == "byte":
return lambda x: np.nan if (x is None or x == "") else np.int8(x)
elif schema_type == "short":
return lambda x: np.nan if (x is None or x == "") else np.int16(x)
elif schema_type == "integer":
return lambda x: np.nan if (x is None or x == "") else np.int32(x)
elif schema_type == "long":
return lambda x: np.nan if (x is None or x == "") else np.int64(x)
elif schema_type == "float":
return lambda x: np.nan if (x is None or x == "") else np.float32(x)
elif schema_type == "double":
return lambda x: np.nan if (x is None or x == "") else np.float64(x)
elif isinstance(schema_type, str) and schema_type.startswith("decimal"):
return lambda x: None if (x is None or x == "") else Decimal(x)
elif schema_type == "string":
return lambda x: None if (x is None or x == "") else str(x)
elif schema_type == "date":
return lambda x: None if (x is None or x == "") else pd.Timestamp(x).date()
elif schema_type == "timestamp":
return lambda x: pd.NaT if (x is None or x == "") else pd.Timestamp(x)
elif schema_type == "binary":
return None # partition on binary column not supported
elif isinstance(schema_type, dict) and schema_type["type"] in ("array", "struct", "map"):
return None # partition on complex column not supported
raise ValueError(f"Could not parse datatype: {schema_type}")How to reproduce: Try reading a table with TIMESTAMP_NTZ column with the following interace:
df = delta_sharing.load_as_pandas(table_url, convert_in_batches=True, use_delta_format=False)
Adding another elif branch with timestamp_ntz solves the issue, I can create a PR if you like.
Metadata
Metadata
Assignees
Labels
No labels