Skip to content

SNOW-721174: Support COPY INTO <location> with storage integration and gcs bucket #368

Open
@dahlbaek

Description

@dahlbaek

What is the current behavior?

The COPY INTO <location> statement support gcs buckets, but there is no GCSBucket ClauseElement in https://github.com/snowflakedb/snowflake-sqlalchemy/blob/a84fa77ae4387ed60db036db61dc7d8f82849a98/src/snowflake/sqlalchemy/custom_commands.py. Similarly, the statement supports using a STORAGE_INTEGRATION option, but that option is not supported by CopyInto

class CopyInto(UpdateBase):
"""Copy Into Command base class, for documentation see:
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html"""
__visit_name__ = "copy_into"
_bind = None
def __init__(self, from_, into, formatter=None):
self.from_ = from_
self.into = into
self.formatter = formatter
self.copy_options = {}
def __repr__(self):
"""
repr for debugging / logging purposes only. For compilation logic, see
the corresponding visitor in base.py
"""
return f"COPY INTO {self.into} FROM {repr(self.from_)} {repr(self.formatter)} ({self.copy_options})"
def bind(self):
return None
def force(self, force):
if not isinstance(force, bool):
raise TypeError("Parameter force should be a boolean value")
self.copy_options.update({"FORCE": translate_bool(force)})
return self
def single(self, single_file):
if not isinstance(single_file, bool):
raise TypeError("Parameter single_file should be a boolean value")
self.copy_options.update({"SINGLE": translate_bool(single_file)})
return self
def maxfilesize(self, max_size):
if not isinstance(max_size, int):
raise TypeError("Parameter max_size should be an integer value")
self.copy_options.update({"MAX_FILE_SIZE": max_size})
return self
def files(self, file_names):
self.copy_options.update({"FILES": FilesOption(file_names)})
return self
def pattern(self, pattern):
self.copy_options.update({"PATTERN": pattern})
return self

What is the desired behavior?

The CopyInto class supports Copy Into <location> statements of the type

COPY INTO '<gcs_bucket>'
FROM (<select>)
FILE_FORMAT = ( TYPE = PARQUET )
STORAGE_INTEGRATION = <gcs_storage_integration>
HEADER = TRUE;

How would this improve snowflake-connector-python?

This would make it possible to use the CopyInto class on order to execute such variants of the COPY INTO <location> statement.

References, Other Background

We're a small company that make use of the snowflake-sqlalchemy library. If someone will give us pointers as to what we should add in terms of classes and tests, we can allocate time to work on a pr.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurestatus-triage_doneInitial triage done, will be further handled by the driver team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions