feat(dtype): support compiling dtypes to sql #11100

NickCrews · 2025-04-07T16:54:23Z

EDIT: This Oracle fix was merged in #11124
Also includes a fix in the Oracle dtype compiler. The new tests cover this. If we change our tests, we should ensure that this code path is still tested. Also, not sure why we choose a default string length of 4000 for oracle, but 2 million for exasol and MAX for mssql. See the snapshots. Perhaps in a followup we should unify this?

cpcloud

I don't really like the implementation of this because it has a special case for each backend, for nearly identical code.

Also, as a steel main to your implementation, why shouldn't we also special case Schema here too? I am against that for the same reason as this implemetation.

As an alternative, how about a method on DataType, called to_sqlglot, similar to how Schema.to_sqlglot works?

Then we can have a very small amount of code in ibis.to_sql that checks for Schema or DataType instances and then calls to_sqlglot, rather than handling the compilation in a very special-casey way in each compiler.

Also, if there's a bug in the Oracle compilation unrelated to this PR, please submit a separate PR for that fix so it shows up in the release notes.

NickCrews · 2025-04-15T14:19:27Z

As an alternative, how about a method on DataType, called to_sqlglot, similar to how Schema.to_sqlglot works?

Then we can have a very small amount of code in ibis.to_sql that checks for Schema or DataType instances and then calls to_sqlglot, rather than handling the compilation in a very special-casey way in each compiler.

This sounds better, good idea, I will do that

Also, if there's a bug in the Oracle compilation unrelated to this PR, please submit a separate PR for that fix so it shows up in the release notes.

Will do!

NickCrews · 2025-04-15T22:57:57Z

Well, it still is sorta ugly, but maybe less ugly than before.

Once I added DataType.to_sqlglot() method, then this also begs for a DataType.from_sqlglot(sqlglot_type: sge.DataType) method, for symmetry with to_arrow/from_arrow, to_pandas/from_pandas, etc. The tricky thing is, I think the conversion sqlglot -> ibis depends on the sqlglot dialect. So it needs more context than the arrow/pandas/etc methods, and so would require an extra argument and a different signature. I think this is a smell that this is the wrong abstraction. I am tempted to think that the ibis<->sqlglot datatype conversion should only be happening on the backend-specific level of SqlGlotTypeMapper, and we shouldn't push it down to the backend-agnostic level of DataType.

NickCrews · 2025-04-17T15:43:57Z

We already have a similar test that checks con.execute(ibis.literal("a string").typeof()) at

ibis/ibis/backends/tests/test_string.py

Lines 78 to 117 in d55a5ee

    
                   param( 
        
                       'STRI"NG', 
        
                       { 
        
                           "bigquery": "STRING", 
        
                           "clickhouse": "String", 
        
                           "snowflake": "VARCHAR", 
        
                           "sqlite": "text", 
        
                           "trino": "varchar(7)", 
        
                           "athena": "varchar(7)", 
        
                           "duckdb": "VARCHAR", 
        
                           "impala": "STRING", 
        
                           "postgres": "text", 
        
                           "risingwave": "text", 
        
                           "flink": "CHAR(7) NOT NULL", 
        
                           "databricks": "string", 
        
                       }, 
        
                       id="string-quote2", 
        
                       marks=[ 
        
                           pytest.mark.notimpl( 
        
                               ["oracle"], 
        
                               raises=OracleDatabaseError, 
        
                               reason="ORA-25716", 
        
                           ), 
        
                           pytest.mark.notimpl( 
        
                               ["risingwave"], 
        
                               raises=PsycoPg2InternalError, 
        
                               reason='sql parser error: Expected end of statement, found: "NG\'" at line:1, column:31 Near "SELECT \'STRI"NG\' AS "\'STRI""', 
        
                           ), 
        
                       ], 
        
                   ), 
        
               ], 
        
           ) 
        
           def test_string_literal(con, backend, text_value, expected_types): 
        
               expr = ibis.literal(text_value) 
        
               result = con.execute(expr) 
        
               assert result == text_value 
        
               with contextlib.suppress(com.OperationNotDefinedError): 
        
                   backend_name = backend.name() 
        
                   assert con.execute(expr.typeof()) == expected_types[backend_name]

This is a bit different though. I'm curious though, why that test, paramaterized with the con fixture, has fewer backends than the ones I test for at https://github.com/ibis-project/ibis/pull/11100/files#diff-7741b9bb82410d26e83f56434bf97f21cb1181d3faadf85e41b32933ffd74ffeR226-R254. I would hope that the existing test would have ALL these backends.

NickCrews · 2025-04-17T15:57:22Z

This also makes me wonder if we want to expose a toplevel ibis.to_sqlglot(). Then ibis.to_sql() would wrap this, and call .sql(pretty=pretty) on the result. I don't think there is currently a public API for users to get a sqlglot expression from an ibis expression, datatype, or schema, is there?

NickCrews · 2025-04-17T19:15:02Z

Thinking about this more, I could simplify the Backend.to_sqlglot() logic by simply not supporting dt.DataType and sch.Schema. Then, the only place we would need the isinstance(x, (dt.DataType, sch.Schema)) check is in api.to_sqlglot(). This would have the downside that then users couldn't do my_backend.to_sqlglot(my_datatype). They could only do ibis.to_sql(my_datatype, my_backend.compiler.dialect), which both is more complicated, and they only get a SQL string out of it, not a sqlglot expression. I think this increased functionality is worth the increased complexity, but wanted to put it out there as a possiblity.

github-actions bot added tests Issues or PRs related to tests bigquery The BigQuery backend sql Backends that generate SQL labels Apr 7, 2025

NickCrews mentioned this pull request Apr 7, 2025

feat: compile datatypes to native types #11073

Open

1 task

NickCrews force-pushed the compile-dtype branch from 30a2ca9 to 16abc2c Compare April 7, 2025 19:54

cpcloud requested changes Apr 14, 2025

View reviewed changes

NickCrews mentioned this pull request Apr 15, 2025

fix(oracle): return a sqlglot.DataType, not str, when compiling string dtype #11124

Merged

NickCrews force-pushed the compile-dtype branch from 16abc2c to 69bb598 Compare April 15, 2025 21:22

github-actions bot added the datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) label Apr 15, 2025

NickCrews force-pushed the compile-dtype branch from 69bb598 to fd78257 Compare April 15, 2025 22:52

NickCrews force-pushed the compile-dtype branch 2 times, most recently from a06c48c to c780113 Compare April 17, 2025 15:30

NickCrews force-pushed the compile-dtype branch 2 times, most recently from 790c400 to 6ca984d Compare April 17, 2025 15:49

NickCrews force-pushed the compile-dtype branch from 6ca984d to 2e61454 Compare April 17, 2025 16:01

NickCrews added 3 commits April 18, 2025 07:38

feat(dtype): support compiling dtypes to sql

cc3106a

fix: fixup doctests

a69f15a

chore: fixup risingwave recursion error

2d29680

NickCrews force-pushed the compile-dtype branch from 20d77bd to 2d29680 Compare April 18, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dtype): support compiling dtypes to sql #11100

feat(dtype): support compiling dtypes to sql #11100

Uh oh!

NickCrews commented Apr 7, 2025 •

edited

Loading

Uh oh!

cpcloud left a comment

Uh oh!

NickCrews commented Apr 15, 2025

Uh oh!

NickCrews commented Apr 15, 2025

Uh oh!

NickCrews commented Apr 17, 2025

Uh oh!

NickCrews commented Apr 17, 2025

Uh oh!

NickCrews commented Apr 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat(dtype): support compiling dtypes to sql #11100

Are you sure you want to change the base?

feat(dtype): support compiling dtypes to sql #11100

Uh oh!

Conversation

NickCrews commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

NickCrews commented Apr 15, 2025

Uh oh!

NickCrews commented Apr 15, 2025

Uh oh!

NickCrews commented Apr 17, 2025

Uh oh!

NickCrews commented Apr 17, 2025

Uh oh!

NickCrews commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NickCrews commented Apr 7, 2025 •

edited

Loading

NickCrews commented Apr 17, 2025 •

edited

Loading