-
Notifications
You must be signed in to change notification settings - Fork 244
MGMT-21708: add join to queries #7994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MGMT-21708: add join to queries #7994
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shay23bra The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7994 +/- ##
===========================================
- Coverage 73.76% 55.44% -18.32%
===========================================
Files 402 303 -99
Lines 69128 55198 -13930
===========================================
- Hits 50989 30603 -20386
- Misses 15404 23505 +8101
+ Partials 2735 1090 -1645
🚀 New features to boost your workflow:
|
internal/common/db.go
Outdated
| db = prepareClusterDB(db, eagerLoading, includeDeleted) | ||
| if eagerLoading { | ||
| // Use optimized loading with direct JOIN queries for network tables | ||
| db = prepareClusterDBWithJoins(db, includeDeleted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add eagerLoading in prepareClusterDBWithJoins parameters, and can we remove prepareClusterDB ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other functions use prepareClusterDB so I don't want to cause problems there. but in this case I added the eagerLoading in prepareClusterDBWithJoins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other functions use prepareClusterDB so I don't want to cause problems there
this new function has the same behavior as prepareClusterDB, and is more efficient. So, I think all calls to prepareClusterDB should be replaced.
|
/retest |
|
Can you dump the resulting query from gorm debug log (or other)? |
internal/common/db.go
Outdated
| HostsTable: true, | ||
| } | ||
|
|
||
| for _, tableName := range ClusterSubTables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we join all sub tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding more joins can increase complexity and we can't know if it will benefit or make other queries worse since we didn't test it on other tables.
it can make higher memory usage, more locks usage and in some cases complex joins can be slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of this issue is to find out, can we run some performance tests to check these assumptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem
|
@shay23bra: This pull request references MGMT-21708 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
internal/common/db.go
Outdated
|
|
||
| db = db.Preload(ClusterNetworksTable, func(db *gorm.DB) *gorm.DB { | ||
| baseDB := db | ||
| return baseDB.Joins("INNER JOIN clusters ON cluster_networks.cluster_id = clusters.id"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clause.JoinTarget{Association: "Clusters"} doesn't work in this context? https://gorm.io/docs/preload.html#Joins-Preloading
Since the query is built in multiple functions, I would really like to get this info, and check if the queries we expect from gorm are the ones that are actually run against the database |
…lauses - Modified GetClustersFromDBWhere to use JOIN-based preloading for network tables - Created prepareClusterDBWithJoins function to avoid large IN clauses - Network tables (cluster_networks, service_networks, machine_networks, hosts) now use INNER JOINs with clusters table - Used map-based lookup for O(1) table type checking instead of O(n²) iteration - Maintains existing functionality while improving performance for large datasets Performance improvements based on testing: - cluster_networks: 18% faster (2.947ms → 2.416ms) - service_networks: 7% faster (2.326ms → 2.159ms) - machine_networks: 48% faster (1.563ms → 0.813ms) - hosts batch: 11% faster (18.006ms → 16.077ms) Addresses slow queries from Jira ticket: - SELECT * FROM cluster_networks WHERE cluster_id IN (...) - 479.516ms - SELECT * FROM service_networks WHERE cluster_id IN (...) - 275.129ms - SELECT * FROM machine_networks WHERE cluster_id IN (...) - 167.984ms - SELECT * FROM hosts WHERE cluster_id IN (...) AND deleted_at IS NULL - 1046.118ms
bcfba54 to
8d26a3c
Compare
|
/retest |
internal/common/db.go
Outdated
| } | ||
|
|
||
| func prepareClusterDB(db *gorm.DB, eagerLoading EagerLoadingState, includeDeleted DeleteRecordsState, conditions ...interface{}) *gorm.DB { | ||
| func prepareClusterDBWithJoins(db *gorm.DB, eagerLoading EagerLoadingState, includeDeleted DeleteRecordsState, conditions ...interface{}) *gorm.DB { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the former name, since we keep the exact same functionnality
| if !includeDeleted { | ||
| db = db.Preload(HostsTable, func(db *gorm.DB) *gorm.DB { | ||
| preloadDB := db.Joins("INNER JOIN clusters ON hosts.cluster_id = clusters.id"). | ||
| Where("hosts.deleted_at IS NULL AND clusters.deleted_at IS NULL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the end it's a single query, right? Then I don't think the need to repeat that for each of the tables
| Where("hosts.deleted_at IS NULL AND clusters.deleted_at IS NULL") | ||
|
|
||
| if len(conditions) > 0 { | ||
| preloadDB = preloadDB.Where(conditions[0], conditions[1:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the previous one
- Renamed function to match existing naming conventions - Updated all function calls to use new name - Maintains same functionality with cleaner naming
|
@shay23bra: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
All 4 slow queries from Jira ticket optimized:
cluster_networks IN clause (479.516ms) → JOIN query (18% faster)
service_networks IN clause (275.129ms) → JOIN query (7% faster)
machine_networks IN clause (167.984ms) → JOIN query (48% faster)
hosts IN clause (1046.118ms) → JOIN query (11% faster)
Test Environment: