-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use oss_ci_benchmark_metadata materialized view #6167
Conversation
@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
I'm curious if you tried one of the data skipping indexes for this https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes. iirc these indexes don't work for replacing merge trees, but the benchmarks table is a merge tree I wonder if it will work well |
From what I read from the doc, it looks like a set or a bloom filter indices can be used here to speed up the string comparison. Let me try to see if I can use them to improve the dashboard data loading. |
I'm attempting to use a new materialized view called `oss_ci_benchmark_metadata` to make it faster to query benchmark metadata. I have already added the new view manually but its definition is included in this PR for review. The old query works, but it's slower because the original table doesn't keep several important columns, i.e. benchmark name, in order https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql#L98-L107. According to https://clickhouse.com/docs/en/sql-reference/statements/alter/order-by, it seems that we cannot add existing columns into ORDER BY. ### Testing When loading the benchmark metadata: * Slooooow https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch * Faster https://torchci-git-fork-huydhn-use-osscibenchmarkm-96b1e1-fbopensource.vercel.app/benchmark/llms?repoName=pytorch%2Fpytorch Loading the benchmark data is still relatively slow IMO, but I will try to improve that in a separate PR.
I'm attempting to use a new materialized view called
oss_ci_benchmark_metadata
to make it faster to query benchmark metadata. I have already added the new view manually but its definition is included in this PR for review.The old query works, but it's slower because the original table doesn't keep several important columns, i.e. benchmark name, in order https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql#L98-L107. According to https://clickhouse.com/docs/en/sql-reference/statements/alter/order-by, it seems that we cannot add existing columns into ORDER BY.
Testing
When loading the benchmark metadata:
Loading the benchmark data is still relatively slow IMO, but I will try to improve that in a separate PR.