Skip to content

Conversation

@shay23bra
Copy link
Contributor

All 4 slow queries from Jira ticket optimized:
cluster_networks IN clause (479.516ms) → JOIN query (18% faster)
service_networks IN clause (275.129ms) → JOIN query (7% faster)
machine_networks IN clause (167.984ms) → JOIN query (48% faster)
hosts IN clause (1046.118ms) → JOIN query (11% faster)

Test Environment:

  • Clusters: 1000
  • Hosts: 31000
  • Infra Envs: 1050
  • Cluster Networks: 1496
  • Service Networks: 970

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 10, 2025
@openshift-ci openshift-ci bot requested review from avishayt and eranco74 September 10, 2025 12:51
@openshift-ci
Copy link

openshift-ci bot commented Sep 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shay23bra
Once this PR has been reviewed and has the lgtm label, please assign eliorerz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Sep 10, 2025

Codecov Report

❌ Patch coverage is 0% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.44%. Comparing base (a1330d9) to head (92fab42).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
internal/common/db.go 0.00% 37 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #7994       +/-   ##
===========================================
- Coverage   73.76%   55.44%   -18.32%     
===========================================
  Files         402      303       -99     
  Lines       69128    55198    -13930     
===========================================
- Hits        50989    30603    -20386     
- Misses      15404    23505     +8101     
+ Partials     2735     1090     -1645     
Files with missing lines Coverage Δ
internal/common/db.go 9.49% <0.00%> (-1.04%) ⬇️

... and 150 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

db = prepareClusterDB(db, eagerLoading, includeDeleted)
if eagerLoading {
// Use optimized loading with direct JOIN queries for network tables
db = prepareClusterDBWithJoins(db, includeDeleted)
Copy link
Contributor

@adriengentil adriengentil Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add eagerLoading in prepareClusterDBWithJoins parameters, and can we remove prepareClusterDB ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other functions use prepareClusterDB so I don't want to cause problems there. but in this case I added the eagerLoading in prepareClusterDBWithJoins

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other functions use prepareClusterDB so I don't want to cause problems there

this new function has the same behavior as prepareClusterDB, and is more efficient. So, I think all calls to prepareClusterDB should be replaced.

@adriengentil
Copy link
Contributor

/retest

@adriengentil
Copy link
Contributor

Can you dump the resulting query from gorm debug log (or other)?

HostsTable: true,
}

for _, tableName := range ClusterSubTables {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we join all sub tables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding more joins can increase complexity and we can't know if it will benefit or make other queries worse since we didn't test it on other tables.
it can make higher memory usage, more locks usage and in some cases complex joins can be slower.

Copy link
Contributor

@adriengentil adriengentil Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope of this issue is to find out, can we run some performance tests to check these assumptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem

@shay23bra shay23bra changed the title [master] MGMT-21708 add join to queries MGMT-21708: add join to queries Sep 11, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 11, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 11, 2025

@shay23bra: This pull request references MGMT-21708 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set.

In response to this:

All 4 slow queries from Jira ticket optimized:
cluster_networks IN clause (479.516ms) → JOIN query (18% faster)
service_networks IN clause (275.129ms) → JOIN query (7% faster)
machine_networks IN clause (167.984ms) → JOIN query (48% faster)
hosts IN clause (1046.118ms) → JOIN query (11% faster)

Test Environment:

  • Clusters: 1000
  • Hosts: 31000
  • Infra Envs: 1050
  • Cluster Networks: 1496
  • Service Networks: 970

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@shay23bra
Copy link
Contributor Author

/retest


db = db.Preload(ClusterNetworksTable, func(db *gorm.DB) *gorm.DB {
baseDB := db
return baseDB.Joins("INNER JOIN clusters ON cluster_networks.cluster_id = clusters.id").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clause.JoinTarget{Association: "Clusters"} doesn't work in this context? https://gorm.io/docs/preload.html#Joins-Preloading

@adriengentil
Copy link
Contributor

Can you dump the resulting query from gorm debug log (or other)?

Since the query is built in multiple functions, I would really like to get this info, and check if the queries we expect from gorm are the ones that are actually run against the database

…lauses

- Modified GetClustersFromDBWhere to use JOIN-based preloading for network tables
- Created prepareClusterDBWithJoins function to avoid large IN clauses
- Network tables (cluster_networks, service_networks, machine_networks, hosts) now use INNER JOINs with clusters table
- Used map-based lookup for O(1) table type checking instead of O(n²) iteration
- Maintains existing functionality while improving performance for large datasets

Performance improvements based on testing:
- cluster_networks: 18% faster (2.947ms → 2.416ms)
- service_networks: 7% faster (2.326ms → 2.159ms)
- machine_networks: 48% faster (1.563ms → 0.813ms)
- hosts batch: 11% faster (18.006ms → 16.077ms)

Addresses slow queries from Jira ticket:
- SELECT * FROM cluster_networks WHERE cluster_id IN (...) - 479.516ms
- SELECT * FROM service_networks WHERE cluster_id IN (...) - 275.129ms
- SELECT * FROM machine_networks WHERE cluster_id IN (...) - 167.984ms
- SELECT * FROM hosts WHERE cluster_id IN (...) AND deleted_at IS NULL - 1046.118ms
@shay23bra shay23bra force-pushed the MGMT-21708-add-join-to-queries-clean branch from bcfba54 to 8d26a3c Compare September 17, 2025 08:21
@shay23bra
Copy link
Contributor Author

/retest

}

func prepareClusterDB(db *gorm.DB, eagerLoading EagerLoadingState, includeDeleted DeleteRecordsState, conditions ...interface{}) *gorm.DB {
func prepareClusterDBWithJoins(db *gorm.DB, eagerLoading EagerLoadingState, includeDeleted DeleteRecordsState, conditions ...interface{}) *gorm.DB {
Copy link
Contributor

@adriengentil adriengentil Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the former name, since we keep the exact same functionnality

if !includeDeleted {
db = db.Preload(HostsTable, func(db *gorm.DB) *gorm.DB {
preloadDB := db.Joins("INNER JOIN clusters ON hosts.cluster_id = clusters.id").
Where("hosts.deleted_at IS NULL AND clusters.deleted_at IS NULL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the end it's a single query, right? Then I don't think the need to repeat that for each of the tables

Where("hosts.deleted_at IS NULL AND clusters.deleted_at IS NULL")

if len(conditions) > 0 {
preloadDB = preloadDB.Where(conditions[0], conditions[1:]...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the previous one

- Renamed function to match existing naming conventions
- Updated all function calls to use new name
- Maintains same functionality with cleaner naming
@openshift-ci
Copy link

openshift-ci bot commented Sep 28, 2025

@shay23bra: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 92fab42 link false /test okd-scos-e2e-aws-ovn
ci/prow/edge-unit-test 92fab42 link true /test edge-unit-test
ci/prow/edge-e2e-ai-operator-ztp 92fab42 link true /test edge-e2e-ai-operator-ztp
ci/prow/e2e-agent-compact-ipv4 92fab42 link true /test e2e-agent-compact-ipv4
ci/prow/edge-subsystem-kubeapi-aws 92fab42 link true /test edge-subsystem-kubeapi-aws
ci/prow/edge-subsystem-aws 92fab42 link true /test edge-subsystem-aws
ci/prow/edge-e2e-metal-assisted-4-20 92fab42 link true /test edge-e2e-metal-assisted-4-20

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants