Skip to content

Conversation

@PeterKeDer
Copy link
Collaborator

@PeterKeDer PeterKeDer commented Jun 17, 2025

Use a SQL query builder to construct datafusion queries so there is less chance of misuse through SQL injections.

This adds a new dependency on PyPika which is used to construct the SQL queries. This is the only popular SQL builder I found that generates plain string SQL (instead of using parameterization).

This changes the behavior of some filters. Namely, filtering an int column by string will no longer work, e.g.

table(filters=[Filter('id', '=', '123')])

(it would have worked in the previous version because '123' is directly interpolated into the query string)

@PeterKeDer PeterKeDer force-pushed the peter/sql-builder branch 2 times, most recently from f056e4e to e662f24 Compare June 17, 2025 21:09
@PeterKeDer PeterKeDer marked this pull request as ready for review June 17, 2025 21:09
"pyarrow==17.0.0",
"ipython==8.5.0",
"typing_extensions==4.13.2",
"pypika>=0.48.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge dependency, sure we want to use it here? SQL injection is something we should check for when processing untrusted user content. I would expect internal code to be trusted code, after all, if the programmer is malicious, they can also submit code to by pass all these safety checks in the same PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed this PR yet but previously the attack vector was type confusion.

If a value was passed in that wasn't of a type that's supported it would get stringified and could then be susceptible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, that's a better reason for doing more validation here, but can this be checked with type annotation instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants