Use oss_ci_benchmark_metadata materialized view #6167

huydhn · 2025-01-14T02:15:22Z

I'm attempting to use a new materialized view called oss_ci_benchmark_metadata to make it faster to query benchmark metadata. I have already added the new view manually but its definition is included in this PR for review.

The old query works, but it's slower because the original table doesn't keep several important columns, i.e. benchmark name, in order https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql#L98-L107. According to https://clickhouse.com/docs/en/sql-reference/statements/alter/order-by, it seems that we cannot add existing columns into ORDER BY.

Testing

When loading the benchmark metadata:

Loading the benchmark data is still relatively slow IMO, but I will try to improve that in a separate PR.

vercel · 2025-01-14T02:15:27Z

@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

vercel · 2025-01-14T02:15:43Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
torchci	✅ Ready (Inspect)	Visit Preview	Jan 20, 2025 11:56pm

clee2000 · 2025-01-14T17:52:52Z

I'm curious if you tried one of the data skipping indexes for this https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes. iirc these indexes don't work for replacing merge trees, but the benchmarks table is a merge tree I wonder if it will work well

huydhn · 2025-01-21T01:47:31Z

I'm curious if you tried one of the data skipping indexes for this https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes. iirc these indexes don't work for replacing merge trees, but the benchmarks table is a merge tree I wonder if it will work well

From what I read from the doc, it looks like a set or a bloom filter indices can be used here to speed up the string comparison. Let me try to see if I can use them to improve the dashboard data loading.

I'm attempting to use a new materialized view called `oss_ci_benchmark_metadata` to make it faster to query benchmark metadata. I have already added the new view manually but its definition is included in this PR for review. The old query works, but it's slower because the original table doesn't keep several important columns, i.e. benchmark name, in order https://github.com/pytorch/test-infra/blob/main/torchci/clickhouse_queries/oss_ci_benchmark_v3/query.sql#L98-L107. According to https://clickhouse.com/docs/en/sql-reference/statements/alter/order-by, it seems that we cannot add existing columns into ORDER BY. ### Testing When loading the benchmark metadata: * Slooooow https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch * Faster https://torchci-git-fork-huydhn-use-osscibenchmarkm-96b1e1-fbopensource.vercel.app/benchmark/llms?repoName=pytorch%2Fpytorch Loading the benchmark data is still relatively slow IMO, but I will try to improve that in a separate PR.

This reverts commit 929e0fe.

huydhn added 2 commits January 13, 2025 18:06

Use oss_ci_benchmark_metadata materialized view

8ac8df9

Add the view definition

b001ee7

huydhn requested review from clee2000 and a team January 14, 2025 02:15

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 14, 2025

huydhn changed the title ~~Use oss ci benchmark metadata mv~~ Use oss_ci_benchmark_metadata materialized view Jan 14, 2025

vercel bot deployed to Preview January 14, 2025 02:18 View deployment

huydhn marked this pull request as ready for review January 14, 2025 02:21

Also sort by workflow id

cdb7e27

vercel bot deployed to Preview January 14, 2025 02:26 View deployment

clee2000 approved these changes Jan 14, 2025

View reviewed changes

huydhn mentioned this pull request Jan 17, 2025

[Schema][Utilization] Add schema tables for job utilization #6183

Merged

huydhn added 2 commits January 20, 2025 13:49

Merge branch 'main' into use-oss_ci_benchmark_metadata-mv

13fb5f0

Fix materialized view query

7017a1d

vercel bot deployed to Preview January 20, 2025 23:56 View deployment

huydhn merged commit d83a62f into pytorch:main Jan 21, 2025
6 checks passed

huydhn added a commit that referenced this pull request Jan 21, 2025

Revert "Use oss_ci_benchmark_metadata materialized view (#6167)"

3e02f2f

This reverts commit 929e0fe.

Camyll pushed a commit that referenced this pull request Jan 22, 2025

Revert "Use oss_ci_benchmark_metadata materialized view (#6167)"

c788173

This reverts commit 929e0fe.

Camyll pushed a commit that referenced this pull request Jan 22, 2025

Revert "Use oss_ci_benchmark_metadata materialized view (#6167)"

4cab69a

This reverts commit 929e0fe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use oss_ci_benchmark_metadata materialized view #6167

Use oss_ci_benchmark_metadata materialized view #6167

huydhn commented Jan 14, 2025 •

edited

Loading

vercel bot commented Jan 14, 2025

vercel bot commented Jan 14, 2025 •

edited

Loading

clee2000 commented Jan 14, 2025

huydhn commented Jan 21, 2025 •

edited

Loading

Use oss_ci_benchmark_metadata materialized view #6167

Use oss_ci_benchmark_metadata materialized view #6167

Conversation

huydhn commented Jan 14, 2025 • edited Loading

Testing

vercel bot commented Jan 14, 2025

vercel bot commented Jan 14, 2025 • edited Loading

clee2000 commented Jan 14, 2025

huydhn commented Jan 21, 2025 • edited Loading

huydhn commented Jan 14, 2025 •

edited

Loading

vercel bot commented Jan 14, 2025 •

edited

Loading

huydhn commented Jan 21, 2025 •

edited

Loading