Skip to content

Optimize join ordering #1065

Open
Open
@hendrikmakait

Description

@hendrikmakait

Problem

Currently, we execute joins in the order they were given by the user. If the user does not pay attention, this can cause a significant performance penalty due to an unnecessary explosion of intermediate results.

Solution

We should automatically optimize the join ordering. Ideally, we have cardinality estimates for this from Parquet files or a metadata store, but we should also try to optimize join ordering without meaningful statistics. Possible approaches here include optimization based on partition counts or equivalence sets (https://blobs.duckdb.org/papers/tom-ebergen-msc-thesis-join-order-optimization-with-almost-no-statistics.pdf).

Previous work

We have already experimented with this, but never gotten to a working solution that we could merge (e.g., #809).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions