Skip to content

SNOW-2019088: Extend write_pandas by a parameter for schema inference #2246

Open
@Argon-

Description

@Argon-

What is the current behavior?

This feature request is born out of an issue I've had with write_pandas() from pandas _tools: https://github.com/snowflakedb/snowflake-connector-python/blob/main/src/snowflake/connector/pandas_tools.py#L250

I've tried to upload a wide dataframe into an existing table using write_pandas and was struggling with various type-related problems, most prominently this one:

snowflake.connector.errors.ProgrammingError: 002023 (22000): SQL compilation error:
Expression type does not match column data type, expecting BINARY(20) but got VARIANT for column BOM_SYS_HD

I tried every method under the sun to force and convert the data types of these columns of this dataframe to exactly what the target table expects, but nothing worked. Note that the data in the dataframe is correct and correctly typed. Essentially, I read from a snowflake table into a dataframe, perform some calculations and write it back.
When I was close to throwing in the towel, I took a look at the code of write_pandas and noticed that when auto_create_table=True is set, a schema inference step is performed. And lo and behold, that made it work. The data arrived correctly in the already existing target table.

What is the desired behavior?

Given that auto_create_table generates a CREATE ... IF NOT EXISTS statement, it’s relatively safe to use even if you already have an existing table. Nonetheless, I’d appreciate a more explicit (and even safer) way to achieve what I need: performing only the schema inference step.
This could be implemented with a new parameter for write_pandas, e.g., infer_schema, defaulting to False. I’ve implemented this in a local copy of this package, and it works well for me. I can submit a pull request if you’re interested.

How would this improve snowflake-connector-python?

Might solve more issues such as mine with a relatively low-friction parameter.

References and other background

No response

Metadata

Metadata

Labels

featurestatus-triage_doneInitial triage done, will be further handled by the driver team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions