-
Notifications
You must be signed in to change notification settings - Fork 14.9k
feat(clickhouse): allow dynamic schema #32610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(clickhouse): allow dynamic schema #32610
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Fix Detected |
---|---|---|
Unvalidated schema name in database connection ▹ view |
Files scanned
File Path | Reviewed |
---|---|
superset/db_engine_specs/clickhouse.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Documentation ✅ Logging ✅ Error Handling ✅ Readability ✅ Design ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
schema: str | None = None, | ||
) -> tuple[URL, dict[str, Any]]: | ||
if schema: | ||
uri = uri.set(database=parse.quote(schema, safe="")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unvalidated schema name in database connection 
Tell me more
What is the issue?
The schema parameter is directly used in a database connection URI with only URL encoding but no input validation.
Why this matters
Without proper validation, malicious schema names could potentially be used for SQL injection or path traversal attacks depending on how ClickHouse handles database names.
Suggested change ∙ Feature Preview
Add input validation before using the schema parameter:
def is_valid_schema_name(schema: str) -> bool:
return bool(re.match('^[a-zA-Z0-9_-]+$', schema))
@classmethod
def adjust_engine_params(cls, uri: URL, connect_args: dict[str, Any],
catalog: str | None = None, schema: str | None = None,
) -> tuple[URL, dict[str, Any]]:
if schema:
if not is_valid_schema_name(schema):
raise ValueError('Invalid schema name')
uri = uri.set(database=parse.quote(schema, safe=''))
return uri, connect_args
💬 Looking for more details? Reply to this comment to chat with Korbit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid concern but looking at other engine_specs like Hive, schema value is also not validated. Shall we follow the norm or just add the validation for only clickhouse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that other engine_specs
like Hive might not verify their schema values, but I believe it would still be good practice to include the validation for ClickHouse in this case. It could help prevent potential SQL injection or path traversal attacks in the future. Even better, we could potentially update all engine_specs
to include this validation. I agree with your concern though - should we implement this only for ClickHouse or consider it for all engine_specs
?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #32610 +/- ##
===========================================
+ Coverage 60.48% 83.46% +22.97%
===========================================
Files 1931 548 -1383
Lines 76236 39366 -36870
Branches 8568 0 -8568
===========================================
- Hits 46114 32855 -13259
+ Misses 28017 6511 -21506
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:
Alternatively it is possible to run pre-commit by running pre-commit manually:
That should fix up the pre-commit job on CI :) |
53b777b
to
72d8e6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Fix Detected |
---|---|---|
Over-aggressive URL encoding of schema names ▹ view |
Files scanned
File Path | Reviewed |
---|---|
superset/db_engine_specs/clickhouse.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Documentation ✅ Logging ✅ Error Handling ✅ Readability ✅ Design ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
if schema: | ||
uri = uri.set(database=parse.quote(schema, safe="")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over-aggressive URL encoding of schema names 
Tell me more
What is the issue?
The schema is being URL encoded without preserving any safe characters, which could lead to compatibility issues with certain schema names containing valid URL characters.
Why this matters
Encoding all characters in the schema name could make certain valid schema names unusable and prevent connections to databases with schema names containing standard URL-safe characters.
Suggested change ∙ Feature Preview
Modify the code to preserve standard URL-safe characters:
if schema:
uri = uri.set(database=parse.quote(schema, safe="/_-."))
💬 Looking for more details? Reply to this comment to chat with Korbit.
SUMMARY
Add adjust_engine_params method to clickhouse db_engine_specs to allow clickhouse to use the defined schema for unqualified table names in query.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION