MGMT-21708: add join to queries #7994

shay23bra · 2025-09-10T12:49:45Z

All 4 slow queries from Jira ticket optimized:
cluster_networks IN clause (479.516ms) → JOIN query (18% faster)
service_networks IN clause (275.129ms) → JOIN query (7% faster)
machine_networks IN clause (167.984ms) → JOIN query (48% faster)
hosts IN clause (1046.118ms) → JOIN query (11% faster)

Test Environment:

Clusters: 1000
Hosts: 31000
Infra Envs: 1050
Cluster Networks: 1496
Service Networks: 970

openshift-ci · 2025-09-10T12:52:06Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shay23bra
Once this PR has been reviewed and has the lgtm label, please assign eliorerz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2025-09-10T14:20:04Z

Codecov Report

❌ Patch coverage is 0% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.44%. Comparing base (a1330d9) to head (92fab42).
⚠️ Report is 11 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/common/db.go	0.00%	37 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #7994       +/-   ##
===========================================
- Coverage   73.76%   55.44%   -18.32%     
===========================================
  Files         402      303       -99     
  Lines       69128    55198    -13930     
===========================================
- Hits        50989    30603    -20386     
- Misses      15404    23505     +8101     
+ Partials     2735     1090     -1645

Files with missing lines	Coverage Δ
internal/common/db.go	`9.49% <0.00%> (-1.04%)`	⬇️

... and 150 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

adriengentil · 2025-09-10T15:20:03Z

internal/common/db.go

-	db = prepareClusterDB(db, eagerLoading, includeDeleted)
+	if eagerLoading {
+		// Use optimized loading with direct JOIN queries for network tables
+		db = prepareClusterDBWithJoins(db, includeDeleted)


add eagerLoading in prepareClusterDBWithJoins parameters, and can we remove prepareClusterDB ?

other functions use prepareClusterDB so I don't want to cause problems there. but in this case I added the eagerLoading in prepareClusterDBWithJoins

other functions use prepareClusterDB so I don't want to cause problems there

this new function has the same behavior as prepareClusterDB, and is more efficient. So, I think all calls to prepareClusterDB should be replaced.

adriengentil · 2025-09-10T15:21:03Z

/retest

adriengentil · 2025-09-10T15:36:51Z

Can you dump the resulting query from gorm debug log (or other)?

internal/common/db.go

adriengentil · 2025-09-10T15:40:41Z

internal/common/db.go

+		HostsTable:           true,
+	}
+
+	for _, tableName := range ClusterSubTables {


can't we join all sub tables?

I don't think this is necessary

adding more joins can increase complexity and we can't know if it will benefit or make other queries worse since we didn't test it on other tables.
it can make higher memory usage, more locks usage and in some cases complex joins can be slower.

The scope of this issue is to find out, can we run some performance tests to check these assumptions?

openshift-ci-robot · 2025-09-11T17:28:09Z

@shay23bra: This pull request references MGMT-21708 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

In response to this:

All 4 slow queries from Jira ticket optimized:
cluster_networks IN clause (479.516ms) → JOIN query (18% faster)
service_networks IN clause (275.129ms) → JOIN query (7% faster)
machine_networks IN clause (167.984ms) → JOIN query (48% faster)
hosts IN clause (1046.118ms) → JOIN query (11% faster)

Test Environment:

Clusters: 1000

Hosts: 31000

Infra Envs: 1050

Cluster Networks: 1496

Service Networks: 970

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

shay23bra · 2025-09-11T18:27:52Z

/retest

internal/common/db.go

adriengentil · 2025-09-15T12:58:03Z

internal/common/db.go

+
+	db = db.Preload(ClusterNetworksTable, func(db *gorm.DB) *gorm.DB {
+		baseDB := db
+		return baseDB.Joins("INNER JOIN clusters ON cluster_networks.cluster_id = clusters.id").


clause.JoinTarget{Association: "Clusters"} doesn't work in this context? https://gorm.io/docs/preload.html#Joins-Preloading

internal/common/db.go

adriengentil · 2025-09-15T14:50:03Z

Can you dump the resulting query from gorm debug log (or other)?

Since the query is built in multiple functions, I would really like to get this info, and check if the queries we expect from gorm are the ones that are actually run against the database

…lauses - Modified GetClustersFromDBWhere to use JOIN-based preloading for network tables - Created prepareClusterDBWithJoins function to avoid large IN clauses - Network tables (cluster_networks, service_networks, machine_networks, hosts) now use INNER JOINs with clusters table - Used map-based lookup for O(1) table type checking instead of O(n²) iteration - Maintains existing functionality while improving performance for large datasets Performance improvements based on testing: - cluster_networks: 18% faster (2.947ms → 2.416ms) - service_networks: 7% faster (2.326ms → 2.159ms) - machine_networks: 48% faster (1.563ms → 0.813ms) - hosts batch: 11% faster (18.006ms → 16.077ms) Addresses slow queries from Jira ticket: - SELECT * FROM cluster_networks WHERE cluster_id IN (...) - 479.516ms - SELECT * FROM service_networks WHERE cluster_id IN (...) - 275.129ms - SELECT * FROM machine_networks WHERE cluster_id IN (...) - 167.984ms - SELECT * FROM hosts WHERE cluster_id IN (...) AND deleted_at IS NULL - 1046.118ms

…t code

shay23bra · 2025-09-25T12:33:03Z

/retest

adriengentil · 2025-09-26T08:52:30Z

internal/common/db.go

 }

-func prepareClusterDB(db *gorm.DB, eagerLoading EagerLoadingState, includeDeleted DeleteRecordsState, conditions ...interface{}) *gorm.DB {
+func prepareClusterDBWithJoins(db *gorm.DB, eagerLoading EagerLoadingState, includeDeleted DeleteRecordsState, conditions ...interface{}) *gorm.DB {


I would keep the former name, since we keep the exact same functionnality

adriengentil · 2025-09-26T09:00:57Z

internal/common/db.go

+	if !includeDeleted {
+		db = db.Preload(HostsTable, func(db *gorm.DB) *gorm.DB {
+			preloadDB := db.Joins("INNER JOIN clusters ON hosts.cluster_id = clusters.id").
+				Where("hosts.deleted_at IS NULL AND clusters.deleted_at IS NULL")


And the end it's a single query, right? Then I don't think the need to repeat that for each of the tables

adriengentil · 2025-09-26T09:01:21Z

internal/common/db.go

+				Where("hosts.deleted_at IS NULL AND clusters.deleted_at IS NULL")
+
+			if len(conditions) > 0 {
+				preloadDB = preloadDB.Where(conditions[0], conditions[1:]...)


same as the previous one

- Renamed function to match existing naming conventions - Updated all function calls to use new name - Maintains same functionality with cleaner naming

openshift-ci · 2025-09-28T12:51:53Z

@shay23bra: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`92fab42`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/edge-unit-test	`92fab42`	link	true	`/test edge-unit-test`
ci/prow/edge-e2e-ai-operator-ztp	`92fab42`	link	true	`/test edge-e2e-ai-operator-ztp`
ci/prow/e2e-agent-compact-ipv4	`92fab42`	link	true	`/test e2e-agent-compact-ipv4`
ci/prow/edge-subsystem-kubeapi-aws	`92fab42`	link	true	`/test edge-subsystem-kubeapi-aws`
ci/prow/edge-subsystem-aws	`92fab42`	link	true	`/test edge-subsystem-aws`
ci/prow/edge-e2e-metal-assisted-4-20	`92fab42`	link	true	`/test edge-e2e-metal-assisted-4-20`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 10, 2025

openshift-ci bot requested review from avishayt and eranco74 September 10, 2025 12:51

adriengentil reviewed Sep 10, 2025

View reviewed changes

internal/common/db.go Outdated Show resolved Hide resolved

adriengentil reviewed Sep 10, 2025

View reviewed changes

shay23bra changed the title ~~[master] MGMT-21708 add join to queries~~ MGMT-21708: add join to queries Sep 11, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 11, 2025

adriengentil reviewed Sep 15, 2025

View reviewed changes

internal/common/db.go Outdated Show resolved Hide resolved

adriengentil reviewed Sep 15, 2025

View reviewed changes

internal/common/db.go Outdated Show resolved Hide resolved

adriengentil reviewed Sep 15, 2025

View reviewed changes

internal/common/db.go Outdated Show resolved Hide resolved

shay23bra added 2 commits September 17, 2025 11:21

add eagerLoading inside prepareClusterDBWithJoins and remove redundan…

8d26a3c

…t code

shay23bra force-pushed the MGMT-21708-add-join-to-queries-clean branch from bcfba54 to 8d26a3c Compare September 17, 2025 08:21

update prepareClusterDBWithJoins + remove prepareClusterDB

caf3634

adriengentil reviewed Sep 26, 2025

View reviewed changes

Rename prepareClusterDBWithJoins to prepareClusterDB

92fab42

- Renamed function to match existing naming conventions - Updated all function calls to use new name - Maintains same functionality with cleaner naming

MGMT-21708: add join to queries #7994

Are you sure you want to change the base?

MGMT-21708: add join to queries #7994

Uh oh!

Conversation

shay23bra commented Sep 10, 2025

Uh oh!

openshift-ci bot commented Sep 10, 2025

Uh oh!

codecov bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

adriengentil Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriengentil commented Sep 10, 2025

Uh oh!

adriengentil commented Sep 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriengentil Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Sep 11, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shay23bra commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adriengentil commented Sep 15, 2025

Uh oh!

shay23bra commented Sep 25, 2025

Uh oh!

adriengentil Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 10, 2025 •

edited

Loading

adriengentil Sep 10, 2025 •

edited

Loading

adriengentil Sep 15, 2025 •

edited

Loading

openshift-ci-robot commented Sep 11, 2025 •

edited by openshift-ci bot

Loading

adriengentil Sep 26, 2025 •

edited

Loading