Skip to content

feat(clickhouse): allow dynamic schema #32610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions superset/db_engine_specs/clickhouse.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import re
from datetime import datetime
from typing import Any, cast, TYPE_CHECKING
from urllib import parse

from flask import current_app
from flask_babel import gettext as __
Expand Down Expand Up @@ -267,6 +268,8 @@ class ClickHouseConnectEngineSpec(BasicParametersMixin, ClickHouseEngineSpec):
parameters_schema = ClickHouseParametersSchema()
encryption_parameters = {"secure": "true"}

supports_dynamic_schema = False

@classmethod
def get_dbapi_exception_mapping(cls) -> dict[type[Exception], type[Exception]]:
return {}
Expand Down Expand Up @@ -414,3 +417,15 @@ def _mutate_label(label: str) -> str:
:return: Conditionally mutated label
"""
return f"{label}_{md5_sha_from_str(label)[:6]}"

@classmethod
def adjust_engine_params(
cls,
uri: URL,
connect_args: dict[str, Any],
catalog: str | None = None,
schema: str | None = None,
) -> tuple[URL, dict[str, Any]]:
if schema:
uri = uri.set(database=parse.quote(schema, safe=""))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unvalidated schema name in database connection category Security

Tell me more
What is the issue?

The schema parameter is directly used in a database connection URI with only URL encoding but no input validation.

Why this matters

Without proper validation, malicious schema names could potentially be used for SQL injection or path traversal attacks depending on how ClickHouse handles database names.

Suggested change ∙ Feature Preview

Add input validation before using the schema parameter:

def is_valid_schema_name(schema: str) -> bool:
    return bool(re.match('^[a-zA-Z0-9_-]+$', schema))

@classmethod
def adjust_engine_params(cls, uri: URL, connect_args: dict[str, Any], 
    catalog: str | None = None, schema: str | None = None,
) -> tuple[URL, dict[str, Any]]:
    if schema:
        if not is_valid_schema_name(schema):
            raise ValueError('Invalid schema name')
        uri = uri.set(database=parse.quote(schema, safe=''))
    return uri, connect_args

Report a problem with this comment

💬 Looking for more details? Reply to this comment to chat with Korbit.

Copy link
Contributor Author

@codenamelxl codenamelxl Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid concern but looking at other engine_specs like Hive, schema value is also not validated. Shall we follow the norm or just add the validation for only clickhouse?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that other engine_specs like Hive might not verify their schema values, but I believe it would still be good practice to include the validation for ClickHouse in this case. It could help prevent potential SQL injection or path traversal attacks in the future. Even better, we could potentially update all engine_specs to include this validation. I agree with your concern though - should we implement this only for ClickHouse or consider it for all engine_specs?

Comment on lines +429 to +430
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-aggressive URL encoding of schema names category Functionality

Tell me more
What is the issue?

The schema is being URL encoded without preserving any safe characters, which could lead to compatibility issues with certain schema names containing valid URL characters.

Why this matters

Encoding all characters in the schema name could make certain valid schema names unusable and prevent connections to databases with schema names containing standard URL-safe characters.

Suggested change ∙ Feature Preview

Modify the code to preserve standard URL-safe characters:

if schema:
    uri = uri.set(database=parse.quote(schema, safe="/_-."))

Report a problem with this comment

💬 Looking for more details? Reply to this comment to chat with Korbit.

return uri, connect_args
Loading