Skip to content

Clickhouse - Default Partition should be monthly (toYYYYMM) rather than daily #5079

Open
@redsquare

Description

@redsquare

Advised on Clickhouse to have partitions of between 30gb and 150gb per partition, I am sure most people using rudder->click do not have this volume daily therefore the default should be monthly - keeping the amount of parts on disk lower should be the preferred option

partitionByClause = fmt.Sprintf(`PARTITION BY toDate(%s)`, partitionField)

Activity

contributor-support

contributor-support commented on Sep 10, 2024

@contributor-support

Thanks for opening this issue! We'll get back to you shortly. If it is a bug, please make sure to add steps to reproduce the issue.

changed the title [-]Default Partition should be monthly (toYYYYMM) rather than daily[/-] [+]Clickhouse - Default Partition should be monthly (toYYYYMM) rather than daily[/+] on Sep 10, 2024
ericdodds

ericdodds commented on Sep 24, 2024

@ericdodds

@redsquare we are going to slot this into an upcoming sprint. I'll reach out to you for more info as we get closer to starting the work.

redsquare

redsquare commented on Nov 6, 2024

@redsquare
Author

@ericdodds any update on this :)

elliotdickison

elliotdickison commented on Jan 6, 2025

@elliotdickison

This would be super helpful to us as it's not possible to change partitions on a table after creation - fixing the problem after the fact is quite tricky.

redsquare

redsquare commented on Jan 6, 2025

@redsquare
Author

@elliotdickison agree, @ericdodds any update on this please :)

gitcommitshow

gitcommitshow commented on Jan 9, 2025

@gitcommitshow
Collaborator

Not shipped yet. I am following up with the team to prioritise this.

elliotdickison

elliotdickison commented on Jan 9, 2025

@elliotdickison

We've come up with an SOP to work around this - we get the CREATE sql for a RudderStack table, modify the sql with the partitioning we want, create the table, copy data over to it, drop the old table, and rename the new table to match the old table's name. We've automated most of this via a script, we just have to remember to run it any time we add a new event and RudderStack creates a new table.

Given that good partitioning depends on use-case I think a config option to set the default partition strategy (hourly, daily, monthly, quarterly, yearly) might be helpful, although if you have to pick a single default I'd guess monthly is better than daily.

redsquare

redsquare commented on Jan 9, 2025

@redsquare
Author
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @redsquare@elliotdickison@ericdodds@gitcommitshow

        Issue actions

          Clickhouse - Default Partition should be monthly (toYYYYMM) rather than daily · Issue #5079 · rudderlabs/rudder-server