Skip to content

feat(dtype): support compiling dtypes to sql #11100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Apr 7, 2025

Fixes #11073

EDIT: This Oracle fix was merged in #11124
Also includes a fix in the Oracle dtype compiler. The new tests cover this. If we change our tests, we should ensure that this code path is still tested. Also, not sure why we choose a default string length of 4000 for oracle, but 2 million for exasol and MAX for mssql. See the snapshots. Perhaps in a followup we should unify this?

@github-actions github-actions bot added tests Issues or PRs related to tests bigquery The BigQuery backend sql Backends that generate SQL labels Apr 7, 2025
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the implementation of this because it has a special case for each backend, for nearly identical code.

Also, as a steel main to your implementation, why shouldn't we also special case Schema here too? I am against that for the same reason as this implemetation.

As an alternative, how about a method on DataType, called to_sqlglot, similar to how Schema.to_sqlglot works?

Then we can have a very small amount of code in ibis.to_sql that checks for Schema or DataType instances and then calls to_sqlglot, rather than handling the compilation in a very special-casey way in each compiler.

Also, if there's a bug in the Oracle compilation unrelated to this PR, please submit a separate PR for that fix so it shows up in the release notes.

@NickCrews
Copy link
Contributor Author

As an alternative, how about a method on DataType, called to_sqlglot, similar to how Schema.to_sqlglot works?

Then we can have a very small amount of code in ibis.to_sql that checks for Schema or DataType instances and then calls to_sqlglot, rather than handling the compilation in a very special-casey way in each compiler.

This sounds better, good idea, I will do that

Also, if there's a bug in the Oracle compilation unrelated to this PR, please submit a separate PR for that fix so it shows up in the release notes.

Will do!

@github-actions github-actions bot added the datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) label Apr 15, 2025
@NickCrews
Copy link
Contributor Author

Well, it still is sorta ugly, but maybe less ugly than before.

Once I added DataType.to_sqlglot() method, then this also begs for a DataType.from_sqlglot(sqlglot_type: sge.DataType) method, for symmetry with to_arrow/from_arrow, to_pandas/from_pandas, etc. The tricky thing is, I think the conversion sqlglot -> ibis depends on the sqlglot dialect. So it needs more context than the arrow/pandas/etc methods, and so would require an extra argument and a different signature. I think this is a smell that this is the wrong abstraction. I am tempted to think that the ibis<->sqlglot datatype conversion should only be happening on the backend-specific level of SqlGlotTypeMapper, and we shouldn't push it down to the backend-agnostic level of DataType.

@NickCrews NickCrews force-pushed the compile-dtype branch 2 times, most recently from a06c48c to c780113 Compare April 17, 2025 15:30
@NickCrews
Copy link
Contributor Author

We already have a similar test that checks con.execute(ibis.literal("a string").typeof()) at

param(
'STRI"NG',
{
"bigquery": "STRING",
"clickhouse": "String",
"snowflake": "VARCHAR",
"sqlite": "text",
"trino": "varchar(7)",
"athena": "varchar(7)",
"duckdb": "VARCHAR",
"impala": "STRING",
"postgres": "text",
"risingwave": "text",
"flink": "CHAR(7) NOT NULL",
"databricks": "string",
},
id="string-quote2",
marks=[
pytest.mark.notimpl(
["oracle"],
raises=OracleDatabaseError,
reason="ORA-25716",
),
pytest.mark.notimpl(
["risingwave"],
raises=PsycoPg2InternalError,
reason='sql parser error: Expected end of statement, found: "NG\'" at line:1, column:31 Near "SELECT \'STRI"NG\' AS "\'STRI""',
),
],
),
],
)
def test_string_literal(con, backend, text_value, expected_types):
expr = ibis.literal(text_value)
result = con.execute(expr)
assert result == text_value
with contextlib.suppress(com.OperationNotDefinedError):
backend_name = backend.name()
assert con.execute(expr.typeof()) == expected_types[backend_name]

This is a bit different though. I'm curious though, why that test, paramaterized with the con fixture, has fewer backends than the ones I test for at https://github.com/ibis-project/ibis/pull/11100/files#diff-7741b9bb82410d26e83f56434bf97f21cb1181d3faadf85e41b32933ffd74ffeR226-R254. I would hope that the existing test would have ALL these backends.

@NickCrews NickCrews force-pushed the compile-dtype branch 2 times, most recently from 790c400 to 6ca984d Compare April 17, 2025 15:49
@NickCrews
Copy link
Contributor Author

This also makes me wonder if we want to expose a toplevel ibis.to_sqlglot(). Then ibis.to_sql() would wrap this, and call .sql(pretty=pretty) on the result. I don't think there is currently a public API for users to get a sqlglot expression from an ibis expression, datatype, or schema, is there?

@NickCrews
Copy link
Contributor Author

NickCrews commented Apr 17, 2025

Thinking about this more, I could simplify the Backend.to_sqlglot() logic by simply not supporting dt.DataType and sch.Schema. Then, the only place we would need the isinstance(x, (dt.DataType, sch.Schema)) check is in api.to_sqlglot(). This would have the downside that then users couldn't do my_backend.to_sqlglot(my_datatype). They could only do ibis.to_sql(my_datatype, my_backend.compiler.dialect), which both is more complicated, and they only get a SQL string out of it, not a sqlglot expression. I think this increased functionality is worth the increased complexity, but wanted to put it out there as a possiblity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) sql Backends that generate SQL tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: compile datatypes to native types
2 participants