Skip to content

feat(clickhouse): allow dynamic schema #32610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

codenamelxl
Copy link
Contributor

SUMMARY

Add adjust_engine_params method to clickhouse db_engine_specs to allow clickhouse to use the defined schema for unqualified table names in query.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added the data:connect:clickhouse Related to Clickhouse label Mar 12, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Security Unvalidated schema name in database connection ▹ view
Files scanned
File Path Reviewed
superset/db_engine_specs/clickhouse.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings
Setting Value
Review Schedule Automatic excluding drafts
Max Issue Count 10
Automatic PR Descriptions
Issue Categories
Category Enabled
Documentation
Logging
Error Handling
Readability
Design
Performance
Security
Functionality

Feedback and Support

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

schema: str | None = None,
) -> tuple[URL, dict[str, Any]]:
if schema:
uri = uri.set(database=parse.quote(schema, safe=""))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unvalidated schema name in database connection category Security

Tell me more
What is the issue?

The schema parameter is directly used in a database connection URI with only URL encoding but no input validation.

Why this matters

Without proper validation, malicious schema names could potentially be used for SQL injection or path traversal attacks depending on how ClickHouse handles database names.

Suggested change ∙ Feature Preview

Add input validation before using the schema parameter:

def is_valid_schema_name(schema: str) -> bool:
    return bool(re.match('^[a-zA-Z0-9_-]+$', schema))

@classmethod
def adjust_engine_params(cls, uri: URL, connect_args: dict[str, Any], 
    catalog: str | None = None, schema: str | None = None,
) -> tuple[URL, dict[str, Any]]:
    if schema:
        if not is_valid_schema_name(schema):
            raise ValueError('Invalid schema name')
        uri = uri.set(database=parse.quote(schema, safe=''))
    return uri, connect_args

Report a problem with this comment

💬 Looking for more details? Reply to this comment to chat with Korbit.

Copy link
Contributor Author

@codenamelxl codenamelxl Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid concern but looking at other engine_specs like Hive, schema value is also not validated. Shall we follow the norm or just add the validation for only clickhouse?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that other engine_specs like Hive might not verify their schema values, but I believe it would still be good practice to include the validation for ClickHouse in this case. It could help prevent potential SQL injection or path traversal attacks in the future. Even better, we could potentially update all engine_specs to include this validation. I agree with your concern though - should we implement this only for ClickHouse or consider it for all engine_specs?

Copy link

codecov bot commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.46%. Comparing base (76d897e) to head (72d8e6e).
Report is 1575 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #32610       +/-   ##
===========================================
+ Coverage   60.48%   83.46%   +22.97%     
===========================================
  Files        1931      548     -1383     
  Lines       76236    39366    -36870     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    32855    -13259     
+ Misses      28017     6511    -21506     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.47% <57.14%> (-0.70%) ⬇️
javascript ?
mysql 75.73% <57.14%> (?)
postgres 75.80% <57.14%> (?)
presto 52.96% <57.14%> (-0.84%) ⬇️
python 83.46% <100.00%> (+19.97%) ⬆️
sqlite 75.32% <57.14%> (?)
unit 61.38% <100.00%> (+3.75%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rusackas
Copy link
Member

Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:

pip3 install -r requirements/development.txt
pre-commit install
A series of checks will now run when you make a git commit.

Alternatively it is possible to run pre-commit by running pre-commit manually:

pre-commit run --all-files

That should fix up the pre-commit job on CI :)

@codenamelxl codenamelxl marked this pull request as draft March 12, 2025 07:24
@pull-request-size pull-request-size bot added size/M and removed size/S labels Mar 12, 2025
@codenamelxl codenamelxl force-pushed the allow-dynamic-schema-clickhouse branch from 53b777b to 72d8e6e Compare March 12, 2025 07:45
@codenamelxl codenamelxl marked this pull request as ready for review March 12, 2025 13:47
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Functionality Over-aggressive URL encoding of schema names ▹ view
Files scanned
File Path Reviewed
superset/db_engine_specs/clickhouse.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings
Setting Value
Review Schedule Automatic excluding drafts
Max Issue Count 10
Automatic PR Descriptions
Issue Categories
Category Enabled
Documentation
Logging
Error Handling
Readability
Design
Performance
Security
Functionality

Feedback and Support

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

Comment on lines +429 to +430
if schema:
uri = uri.set(database=parse.quote(schema, safe=""))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-aggressive URL encoding of schema names category Functionality

Tell me more
What is the issue?

The schema is being URL encoded without preserving any safe characters, which could lead to compatibility issues with certain schema names containing valid URL characters.

Why this matters

Encoding all characters in the schema name could make certain valid schema names unusable and prevent connections to databases with schema names containing standard URL-safe characters.

Suggested change ∙ Feature Preview

Modify the code to preserve standard URL-safe characters:

if schema:
    uri = uri.set(database=parse.quote(schema, safe="/_-."))

Report a problem with this comment

💬 Looking for more details? Reply to this comment to chat with Korbit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants