feat(snowflake): support VECTOR type via Array(length=N)#12005
Open
daveauerbach wants to merge 1 commit into
Open
feat(snowflake): support VECTOR type via Array(length=N)#12005daveauerbach wants to merge 1 commit into
daveauerbach wants to merge 1 commit into
Conversation
Snowflake's `VECTOR(<element>, <dimension>)` type stores fixed-length
numeric vectors used by AI/ML workloads. The Snowflake Python connector
already deserializes VECTOR columns to native pyarrow
`fixed_size_list<element>[length]`, but ibis had no VECTOR -> ibis
dtype mapping, so the column came back typed as `utf8` (via the
`Unknown -> string` fallback) and `SnowflakePyArrowData.convert_column`
crashed downstream with:
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
fixed_size_list<element: float not null>[N] to utf8 using function
cast_string
This adds two pieces:
* `SnowflakeType._from_sqlglot_VECTOR` parses Snowflake VECTOR into
`dt.Array(<element>, length=<dimension>)`. Per the Snowflake docs
(https://docs.snowflake.com/en/sql-reference/data-types-vector), the
element types are documented as 32-bit (INT -> 32-bit signed int,
FLOAT -> 32-bit single-precision float). sqlglot normalizes
VECTOR-element FLOAT to `Type.DOUBLE` in its DataType AST, so we
narrow to `Float32`/`Int32` rather than falling back to the default
scalar mappings (which would give Float64/Int64). VECTOR elements
are also not nullable per Snowflake's storage model, hence
`nullable=False` on the inner dtype.
* `SnowflakePyArrowData.convert_column` / `convert_scalar` now pass
fixed-length array columns through unchanged. Without this guard,
the JSON-extension wrapping path tries to cast the
`fixed_size_list<...>` data to `utf8` (the storage type of
`PYARROW_JSON_TYPE`) and raises `ArrowNotImplementedError`.
* `SnowflakeType._from_ibis_Array` round-trips fixed-length numeric
Arrays back to `VECTOR(<element>, <dimension>)`. Other fixed-length
element types (e.g. strings, structs) and variable-length arrays
continue to compile to `ARRAY` (the existing JSON-backed
variant-array).
Test plan:
* `ibis/backends/snowflake/tests/test_datatypes.py::test_parse` adds
three parametrized VECTOR cases (`VECTOR(FLOAT, 4)`,
`VECTOR(FLOAT, 512)`, `VECTOR(INT, 8)`).
* `test_extract_type_from_table_query` adds two parametrized VECTOR
cases that round-trip through a live Snowflake `CREATE TEMP TABLE`.
* New `test_vector_column_pyarrow_passthrough_for_fixed_size_arrays`
exercises the converter directly with a synthetic pyarrow
`fixed_size_list<float32>[4]` column -- no Snowflake connection
needed -- and asserts pass-through plus value preservation.
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Snowflake's
VECTOR(<element>, <dimension>)type stores fixed-length numeric vectors used by AI/ML workloads. The Snowflake Python connector already deserializes VECTOR columns to native pyarrowfixed_size_list<element>[length], but ibis had no VECTOR -> ibis dtype mapping, so the column came back typed asutf8(via theUnknown -> stringfallback) andSnowflakePyArrowData.convert_columncrashed downstream with:This PR adds three pieces:
SnowflakeType._from_sqlglot_VECTORparses Snowflake VECTOR intodt.Array(<element>, length=<dimension>). Per the Snowflake docs, element types are 32-bit (INT-> 32-bit signed integer,FLOAT-> 32-bit single-precision floating-point). sqlglot normalizes the VECTOR-element FLOAT toType.DOUBLEin its DataType AST, so we narrow toFloat32/Int32rather than falling back to the default scalar mappings (which would give Float64/Int64). VECTOR elements are also not nullable per Snowflake's storage model, hencenullable=Falseon the inner dtype.SnowflakePyArrowData.convert_column/convert_scalarnow pass fixed-length array columns through unchanged. Without this guard, the JSON-extension wrapping path tries to cast thefixed_size_list<...>data toutf8(the storage type ofPYARROW_JSON_TYPE) and raisesArrowNotImplementedError.SnowflakeType._from_ibis_Arrayround-trips fixed-length numeric Arrays back toVECTOR(<element>, <dimension>). Other fixed-length element types (e.g. strings, structs) and variable-length arrays continue to compile toARRAY(the existing JSON-backed variant-array).Test plan
ibis/backends/snowflake/tests/test_datatypes.py::test_parseadds three parametrized VECTOR cases (VECTOR(FLOAT, 4),VECTOR(FLOAT, 512),VECTOR(INT, 8)).test_extract_type_from_table_queryadds two parametrized VECTOR cases that round-trip through a live SnowflakeCREATE TEMP TABLE.test_vector_column_pyarrow_passthrough_for_fixed_size_arraysexercises the converter directly with a synthetic pyarrowfixed_size_list<float32>[4]column -- no Snowflake connection needed -- and asserts both pass-through identity and value preservation.Locally:
(The
test_extract_type_from_table_queryparametrized cases are gated on a live Snowflake connection and will run in CI.)Notes
visit_HexDigestforSHA2/MD5).VECTOR(FLOAT, 512)column (an embedding store) throughibis.snowflake.to_polars().Made with Cursor
Made with Cursor