Skip to content

Detect wether query just filters rows or is more complex with sqlglot #2557

Open
@sh-rp

Description

@sh-rp

TLDR

For deciding which hints to preserve during our lineage / column schema detection, we need to know if a given sql query is just filtering a subset of rows with all columns from a given table or doing something more advanced (subset of columns, joins, aggregates..). For the simple filtering we can keep all hints, for the more advanced queries we need to remove primary_key and unique and other hints, since we don't really know how those columns should be used.

The main task

We need a small sqlglot based util that can analyze a given query and decide wether this is a simple query or a complex query.

Examples for a table with three columns col1, col2, col3

select * from table -> simple, only one table is accessed and all columns are there
select col1, col2 from table -> complex as not all columns are there
select t1.col1 as blah, t1.col2 as blubb, t1.col3 from table as t -> simple, only one table, all columns are there
select * from t1 join t2 on t1.id =t2.t1_id -> complex, it's a join
select * from t1 where t1.id > 5 offset 5 limit 10 -> simple, only one table, all columns are there
select *, static_val = "123" from table -> debatable, but I would say simple, only one table is accessed and all columns are there.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions