Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use oss_ci_benchmark_metadata materialized view #6167

Merged
merged 5 commits into from
Jan 21, 2025

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jan 14, 2025

I'm attempting to use a new materialized view called oss_ci_benchmark_metadata to make it faster to query benchmark metadata. I have already added the new view manually but its definition is included in this PR for review.

The old query works, but it's slower because the original table doesn't keep several important columns, i.e. benchmark name, in order https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql#L98-L107. According to https://clickhouse.com/docs/en/sql-reference/statements/alter/order-by, it seems that we cannot add existing columns into ORDER BY.

Testing

When loading the benchmark metadata:

Loading the benchmark data is still relatively slow IMO, but I will try to improve that in a separate PR.

@huydhn huydhn requested review from clee2000 and a team January 14, 2025 02:15
Copy link

vercel bot commented Jan 14, 2025

@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 14, 2025
@huydhn huydhn changed the title Use oss ci benchmark metadata mv Use oss_ci_benchmark_metadata materialized view Jan 14, 2025
Copy link

vercel bot commented Jan 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview Jan 20, 2025 11:56pm

@huydhn huydhn marked this pull request as ready for review January 14, 2025 02:21
@clee2000
Copy link
Contributor

I'm curious if you tried one of the data skipping indexes for this https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes. iirc these indexes don't work for replacing merge trees, but the benchmarks table is a merge tree I wonder if it will work well

@huydhn huydhn merged commit d83a62f into pytorch:main Jan 21, 2025
6 checks passed
@huydhn
Copy link
Contributor Author

huydhn commented Jan 21, 2025

I'm curious if you tried one of the data skipping indexes for this https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes. iirc these indexes don't work for replacing merge trees, but the benchmarks table is a merge tree I wonder if it will work well

From what I read from the doc, it looks like a set or a bloom filter indices can be used here to speed up the string comparison. Let me try to see if I can use them to improve the dashboard data loading.

Camyll pushed a commit that referenced this pull request Jan 21, 2025
I'm attempting to use a new materialized view called
`oss_ci_benchmark_metadata` to make it faster to query benchmark
metadata. I have already added the new view manually but its definition
is included in this PR for review.

The old query works, but it's slower because the original table doesn't
keep several important columns, i.e. benchmark name, in order
https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql#L98-L107.
According to
https://clickhouse.com/docs/en/sql-reference/statements/alter/order-by,
it seems that we cannot add existing columns into ORDER BY.

### Testing

When loading the benchmark metadata:

* Slooooow
https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch
* Faster
https://torchci-git-fork-huydhn-use-osscibenchmarkm-96b1e1-fbopensource.vercel.app/benchmark/llms?repoName=pytorch%2Fpytorch

Loading the benchmark data is still relatively slow IMO, but I will try
to improve that in a separate PR.
huydhn added a commit that referenced this pull request Jan 21, 2025
Camyll pushed a commit that referenced this pull request Jan 22, 2025
Camyll pushed a commit that referenced this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants