Description
What is the current behavior?
This feature request is born out of an issue I've had with write_pandas()
from pandas _tools
: https://github.com/snowflakedb/snowflake-connector-python/blob/main/src/snowflake/connector/pandas_tools.py#L250
I've tried to upload a wide dataframe into an existing table using write_pandas
and was struggling with various type-related problems, most prominently this one:
snowflake.connector.errors.ProgrammingError: 002023 (22000): SQL compilation error:
Expression type does not match column data type, expecting BINARY(20) but got VARIANT for column BOM_SYS_HD
I tried every method under the sun to force and convert the data types of these columns of this dataframe to exactly what the target table expects, but nothing worked. Note that the data in the dataframe is correct and correctly typed. Essentially, I read from a snowflake table into a dataframe, perform some calculations and write it back.
When I was close to throwing in the towel, I took a look at the code of write_pandas
and noticed that when auto_create_table=True
is set, a schema inference step is performed. And lo and behold, that made it work. The data arrived correctly in the already existing target table.
What is the desired behavior?
Given that auto_create_table
generates a CREATE ... IF NOT EXISTS
statement, it’s relatively safe to use even if you already have an existing table. Nonetheless, I’d appreciate a more explicit (and even safer) way to achieve what I need: performing only the schema inference step.
This could be implemented with a new parameter for write_pandas
, e.g., infer_schema
, defaulting to False
. I’ve implemented this in a local copy of this package, and it works well for me. I can submit a pull request if you’re interested.
How would this improve snowflake-connector-python
?
Might solve more issues such as mine with a relatively low-friction parameter.
References and other background
No response