Skip to content

[Data] Add TPCH queries 5,7,8,9 for benchmarking#60662

Open
daiping8 wants to merge 9 commits intoray-project:masterfrom
daiping8:tpchq5
Open

[Data] Add TPCH queries 5,7,8,9 for benchmarking#60662
daiping8 wants to merge 9 commits intoray-project:masterfrom
daiping8:tpchq5

Conversation

@daiping8
Copy link
Contributor

@daiping8 daiping8 commented Feb 2, 2026

Description

Adding Query Q5, Q7, Q8, Q9 for TPCH tests

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds TPC-H queries 5, 7, 8, and 9 for benchmarking purposes. The overall structure of the new query files is consistent. However, I've found several correctness issues where the implementations deviate significantly from the TPC-H specifications for queries 7, 8, and 9. These need to be addressed to ensure the benchmarks are valid. Additionally, there are opportunities to improve performance in queries 5 and 9 by optimizing the join logic. The configuration changes in the YAML file are appropriate.

Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
daiping8 and others added 2 commits February 2, 2026 18:14
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: ZTE Ray <dai.ping88@zte.com.cn>
Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
@daiping8 daiping8 changed the title [Data] Add TPCH queries 5 to 9 for benchmarking [Data] Add TPCH queries 5,7,8,9 for benchmarking Feb 2, 2026
@daiping8 daiping8 changed the title [Data] Add TPCH queries 5,7,8,9 for benchmarking [WIP][Data] Add TPCH queries 5,7,8,9 for benchmarking Feb 2, 2026
@daiping8 daiping8 marked this pull request as ready for review February 2, 2026 10:40
Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
… for improved clarity and consistency.

Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

1
Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Feb 2, 2026
@daiping8 daiping8 changed the title [WIP][Data] Add TPCH queries 5,7,8,9 for benchmarking [Data] Add TPCH queries 5,7,8,9 for benchmarking Feb 3, 2026
@daiping8
Copy link
Contributor Author

daiping8 commented Feb 3, 2026

@owenowenisme Please review the code. Looking forward to any suggestions.

@iamjustinhsu
Copy link
Contributor

iamjustinhsu commented Feb 4, 2026

Hi @daiping8, can you help me understand why are you adding these benchmarks?

@daiping8
Copy link
Contributor Author

daiping8 commented Feb 5, 2026

Hi @daiping8, can you help me understand why are you adding these benchmarks?

Hi. This is a task assigned by the Ray Data Team. https://docs.google.com/document/d/1OFFp2jMMnrCPiE0Gxdi0ronXGVqtDYDbUoS3fsNc54Q/edit?pli=1&tab=t.0

Copy link
Member

@owenowenisme owenowenisme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing some tables in common.py ? How about let's open up a pr first to add the name mapping?

FYI

=== region ===
Column names: ['column0', 'column1', 'column2', 'column3']
column0: int64
column1: string
column2: string
column3: string

=== supplier ===
Column names: ['column0', 'column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']
column0: int64
column1: string
column2: string
column3: int64
column4: string
column5: double
column6: string
column7: string

… nation, supplier, customer, orders, part, and partsupp

Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants