Skip to content

feat(snowflake): support VECTOR type via Array(length=N)#12005

Open
daveauerbach wants to merge 1 commit into
ibis-project:mainfrom
daveauerbach:feat/snowflake-vector
Open

feat(snowflake): support VECTOR type via Array(length=N)#12005
daveauerbach wants to merge 1 commit into
ibis-project:mainfrom
daveauerbach:feat/snowflake-vector

Conversation

@daveauerbach

@daveauerbach daveauerbach commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Snowflake's VECTOR(<element>, <dimension>) type stores fixed-length numeric vectors used by AI/ML workloads. The Snowflake Python connector already deserializes VECTOR columns to native pyarrow fixed_size_list<element>[length], but ibis had no VECTOR -> ibis dtype mapping, so the column came back typed as utf8 (via the Unknown -> string fallback) and SnowflakePyArrowData.convert_column crashed downstream with:

pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
fixed_size_list<element: float not null>[N] to utf8 using function cast_string

This PR adds three pieces:

  1. SnowflakeType._from_sqlglot_VECTOR parses Snowflake VECTOR into dt.Array(<element>, length=<dimension>). Per the Snowflake docs, element types are 32-bit (INT -> 32-bit signed integer, FLOAT -> 32-bit single-precision floating-point). sqlglot normalizes the VECTOR-element FLOAT to Type.DOUBLE in its DataType AST, so we narrow to Float32/Int32 rather than falling back to the default scalar mappings (which would give Float64/Int64). VECTOR elements are also not nullable per Snowflake's storage model, hence nullable=False on the inner dtype.

  2. SnowflakePyArrowData.convert_column / convert_scalar now pass fixed-length array columns through unchanged. Without this guard, the JSON-extension wrapping path tries to cast the fixed_size_list<...> data to utf8 (the storage type of PYARROW_JSON_TYPE) and raises ArrowNotImplementedError.

  3. SnowflakeType._from_ibis_Array round-trips fixed-length numeric Arrays back to VECTOR(<element>, <dimension>). Other fixed-length element types (e.g. strings, structs) and variable-length arrays continue to compile to ARRAY (the existing JSON-backed variant-array).

Test plan

  • ibis/backends/snowflake/tests/test_datatypes.py::test_parse adds three parametrized VECTOR cases (VECTOR(FLOAT, 4), VECTOR(FLOAT, 512), VECTOR(INT, 8)).
  • test_extract_type_from_table_query adds two parametrized VECTOR cases that round-trip through a live Snowflake CREATE TEMP TABLE.
  • New test_vector_column_pyarrow_passthrough_for_fixed_size_arrays exercises the converter directly with a synthetic pyarrow fixed_size_list<float32>[4] column -- no Snowflake connection needed -- and asserts both pass-through identity and value preservation.

Locally:

$ pytest ibis/backends/snowflake/tests/test_datatypes.py -v -k "test_parse or test_vector_column_pyarrow"
======================== 18 passed in 0.05s =========================

(The test_extract_type_from_table_query parametrized cases are gated on a live Snowflake connection and will run in CI.)

Notes

  • Same pattern shape as the previous Snowflake compiler PR I shipped a few months ago, #11940 (which added visit_HexDigest for SHA2/MD5).
  • Discovered while exercising a real PROD Snowflake table with a VECTOR(FLOAT, 512) column (an embedding store) through ibis.snowflake.to_polars().

Made with Cursor

Made with Cursor

Snowflake's `VECTOR(<element>, <dimension>)` type stores fixed-length
numeric vectors used by AI/ML workloads. The Snowflake Python connector
already deserializes VECTOR columns to native pyarrow
`fixed_size_list<element>[length]`, but ibis had no VECTOR -> ibis
dtype mapping, so the column came back typed as `utf8` (via the
`Unknown -> string` fallback) and `SnowflakePyArrowData.convert_column`
crashed downstream with:

    pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
    fixed_size_list<element: float not null>[N] to utf8 using function
    cast_string

This adds two pieces:

* `SnowflakeType._from_sqlglot_VECTOR` parses Snowflake VECTOR into
  `dt.Array(<element>, length=<dimension>)`. Per the Snowflake docs
  (https://docs.snowflake.com/en/sql-reference/data-types-vector), the
  element types are documented as 32-bit (INT -> 32-bit signed int,
  FLOAT -> 32-bit single-precision float). sqlglot normalizes
  VECTOR-element FLOAT to `Type.DOUBLE` in its DataType AST, so we
  narrow to `Float32`/`Int32` rather than falling back to the default
  scalar mappings (which would give Float64/Int64). VECTOR elements
  are also not nullable per Snowflake's storage model, hence
  `nullable=False` on the inner dtype.

* `SnowflakePyArrowData.convert_column` / `convert_scalar` now pass
  fixed-length array columns through unchanged. Without this guard,
  the JSON-extension wrapping path tries to cast the
  `fixed_size_list<...>` data to `utf8` (the storage type of
  `PYARROW_JSON_TYPE`) and raises `ArrowNotImplementedError`.

* `SnowflakeType._from_ibis_Array` round-trips fixed-length numeric
  Arrays back to `VECTOR(<element>, <dimension>)`. Other fixed-length
  element types (e.g. strings, structs) and variable-length arrays
  continue to compile to `ARRAY` (the existing JSON-backed
  variant-array).

Test plan:

* `ibis/backends/snowflake/tests/test_datatypes.py::test_parse` adds
  three parametrized VECTOR cases (`VECTOR(FLOAT, 4)`,
  `VECTOR(FLOAT, 512)`, `VECTOR(INT, 8)`).
* `test_extract_type_from_table_query` adds two parametrized VECTOR
  cases that round-trip through a live Snowflake `CREATE TEMP TABLE`.
* New `test_vector_column_pyarrow_passthrough_for_fixed_size_arrays`
  exercises the converter directly with a synthetic pyarrow
  `fixed_size_list<float32>[4]` column -- no Snowflake connection
  needed -- and asserts pass-through plus value preservation.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions Bot added tests Issues or PRs related to tests sql Backends that generate SQL snowflake The Snowflake backend labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snowflake The Snowflake backend sql Backends that generate SQL tests Issues or PRs related to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant