-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat performance monitoring #189
base: epic_db_query_performance_monitoring
Are you sure you want to change the base?
Feat performance monitoring #189
Conversation
|
a3610b4
to
a920d40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I only reviewed the new code, and would not have 100% confidence on any impacts on the older code. Therefore, additional reviews would be required.
- As pointed by the linter, some files are not formatted correctly.
- It is difficult to assert and understand the logic without any comments or unit tests. Can we please add unit tests for all logical functions and utils. Please add any comments helpful to understand what's going on :).
- I also see some code duplication as pointed by the linter as well. We should check if that can be removed.
- If we can, I think it can be refactored to only check for version once as the integration is running for a single DBMS, and build our struct based on that, which can still have the same functions for getting query metrics. But use different adapters based on the version. This could remove some duplication and improve readability.
- It would be helpful if the PR description is updated with some more details and testing details.
Thanks
src/query-performance-monitoring/common-utils/common-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/query-fetch-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/query-fetch-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/performance-metrics/execution_plan_metrics.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajrohanyadav
Thanks for reviewing the PR
can you mention the comments with criticality as P0 ,P1 etc
we will work on them accordingly
P0 - Unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajrohanyadav
Added unit test-cases
src/query-performance-monitoring/common-utils/common-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/common-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/common-helpers.go
Outdated
Show resolved
Hide resolved
const MaxQueryThreshold = 30 | ||
const MaxIndividualQueryThreshold = 10 | ||
const PublishThreshold = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments explaining why this value were chosen and how they are used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was max metrics we would fetch respectively
This was discussed with PMs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though you discussed it with PMs it would be helpful for the future us to know why a value was chosen.
Moreover, we should add comments to constants explaining what they do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding comment explaining the constants.
Can you also add a NOTE
explaining WHY we have choose those defaults and the assumptions made.
It would be helpful in the future to know why a value was chosen.
src/query-performance-monitoring/common-utils/ingestion-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/ingestion-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/ingestion-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/common-utils/query-fetch-helpers.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/performance-metrics/individual_query_metrics.go
Outdated
Show resolved
Hide resolved
src/query-performance-monitoring/performance-metrics/individual_query_metrics_test.go
Outdated
Show resolved
Hide resolved
return individualQueryMetricsListInterface, individualQueryMetricsForExecPlanList | ||
} | ||
|
||
func getIndividualQueriesSamples(conn *performancedbconnection.PGSQLConnection, slowRunningQueries datamodels.SlowRunningQueryMetrics, args args.ArgumentList, databaseNames string, anonymizedQueriesByDB map[string]map[string]string, individualQueryMetricsForExecPlanList *[]datamodels.IndividualQueryMetrics, individualQueryMetricsListInterface *[]interface{}, versionSpecificIndividualQuery string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be refactored, there are 8 args, it is not clear what it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually it is easier to return a copy instead of modifying it via pointer, it is not clear by looking at the signature that we are modifying something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i return a list
i have to merge the list back to another list in caller - adding extra complexity
and if i pass it by pointer
it will add directly to list which is in caller
regards the args - i would like to discuss something - let me know when you be free to connect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hope this was resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From yesterdays discussion with Paolo, instead of passing pointers and modifying it in the func, we would return the required lists, This would make code readability easier and also reduce number of parameters the func needs.
46847de
to
8509bcd
Compare
src/query-performance-monitoring/performance-metrics/execution_plan_metrics.go
Show resolved
Hide resolved
src/query-performance-monitoring/performance-metrics/execution_plan_metrics.go
Show resolved
Hide resolved
src/query-performance-monitoring/performance-metrics/execution_plan_metrics_test.go
Show resolved
Hide resolved
src/query-performance-monitoring/performance-metrics/execution_plan_metrics_test.go
Show resolved
Hide resolved
return individualQueryMetricsListInterface, individualQueryMetricsForExecPlanList | ||
} | ||
|
||
func getIndividualQueriesSamples(conn *performancedbconnection.PGSQLConnection, slowRunningQueries datamodels.SlowRunningQueryMetrics, gv *globalvariables.GlobalVariables, anonymizedQueriesByDB map[string]map[string]string, individualQueryMetricsForExecPlanList *[]datamodels.IndividualQueryMetrics, individualQueryMetricsListInterface *[]interface{}, versionSpecificIndividualQuery string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the issue with this signature is that the args are inputs and output of the function, moreover we could think of grouping the results together and maybe build the query outside, by doing so we could get something like:
query := fmt.Sprintf(...)
results := getIndividualQueriesSamples(conn, query, anonymizedQueriesByDB)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the query changes as per the queryId
so we have to structure the query inside the function rather than in caller
yes the args contain the input and output to remove the extra complexity of adding a list to another list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the query changes as per the queryId
so we have to structure the query inside the function rather than in caller
Even if the query changes as per the queryid. We can build it in the caller function right?
yes the args contain the input and output to remove the extra complexity of adding a list to another list
Refer this comment. for reducing the number of parameters the func needs
Why are we passing this parameter slowRunningQueries datamodels.SlowRunningQueryMetrics
to the func if we only want *slowRunningQueries.QueryID
?
src/query-performance-monitoring/performance-metrics/individual_query_metrics.go
Outdated
Show resolved
Hide resolved
log.Debug("Starting PopulateSlowRunningMetrics at ", start) | ||
slowRunningQueries := performancemetrics.PopulateSlowRunningMetrics(newConnection, pgIntegration, gv) | ||
log.Debug("PopulateSlowRunningMetrics completed in ", time.Since(start)) | ||
|
||
start = time.Now() | ||
log.Debug("Starting PopulateWaitEventMetrics at ", start) | ||
_ = performancemetrics.PopulateWaitEventMetrics(newConnection, pgIntegration, gv) | ||
log.Debug("PopulateWaitEventMetrics completed in ", time.Since(start)) | ||
|
||
start = time.Now() | ||
log.Debug("Starting PopulateBlockingMetrics at ", start) | ||
performancemetrics.PopulateBlockingMetrics(newConnection, pgIntegration, gv) | ||
log.Debug("PopulateBlockingMetrics completed in ", time.Since(start)) | ||
|
||
start = time.Now() | ||
log.Debug("Starting PopulateIndividualQueryMetrics at ", start) | ||
individualQueries := performancemetrics.PopulateIndividualQueryMetrics(newConnection, slowRunningQueries, pgIntegration, gv) | ||
log.Debug("PopulateIndividualQueryMetrics completed in ", time.Since(start)) | ||
|
||
start = time.Now() | ||
log.Debug("Starting PopulateExecutionPlanMetrics at ", start) | ||
performancemetrics.PopulateExecutionPlanMetrics(individualQueries, pgIntegration, gv) | ||
log.Debug("PopulateExecutionPlanMetrics completed in ", time.Since(start)) | ||
|
||
log.Debug("Query analysis completed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract all of this to a different function to hide the complexity from this sub_main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is resolved
* resolved : review comments --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
fix: lint issues --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
* added more scenerios for unit test cases --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
* rename variable --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
resolved : review comments --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
* remove unused variables --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
* resolved: review comments --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
Feat: resolved review comments
* add check while fetching execution plan * resolved : lint issues --------- Co-authored-by: Jyothi Surampudi <[email protected]> Co-authored-by: jsurampudi <[email protected]> Co-authored-by: Rahul Reddy <[email protected]>
log.Error("Unsupported postgres version: %v", err) | ||
return nil, nil, err | ||
} | ||
var query = fmt.Sprintf(versionSpecificSlowQuery, gv.DatabaseString, min(gv.Arguments.QueryCountThreshold, commonutils.MaxQueryCountThreshold)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If gv.Arguments.QueryCountThreshold
is configured with a -ve value, We need to log a warning saying QueryCountThreshold
cannot be -ve and use the DefaultQueryCountThreshold value here.
Whenever gv.Arguments.QueryCountThreshold
is configured with a value greater than commonutils.MaxQueryCountThreshold
then we are only fetching commonutils.MaxQueryCountThreshold
number of results. We definitely need a warning log here. It would help when debuggin prod issues.
Unifying the logic that checks if the args are valid and logging warn in /validations/perfromance_metrics_validations.go
would help in reducing redandant checks wherever min(gv.Arguments.QueryCountThreshold, commonutils.MaxIndividualQueryCountThreshold)
is being used.
log.Warn("Query count threshold is greater than max supported value, setting to max supported value: %d", constants.MaxQueryCountThreshold)
|
||
func GetWaitEventMetrics(conn *performancedbconnection.PGSQLConnection, gv *globalvariables.GlobalVariables) ([]interface{}, error) { | ||
var waitEventMetricsList []interface{} | ||
var query = fmt.Sprintf(queries.WaitEvents, gv.DatabaseString, min(gv.Arguments.QueryCountThreshold, commonutils.MaxQueryCountThreshold)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, check if the QueryCountThreshold is valid and log a warning whenever required. Refer here
log.Error("Unsupported postgres version: %v", err) | ||
return nil, err | ||
} | ||
var query = fmt.Sprintf(versionSpecificBlockingQuery, gv.DatabaseString, min(gv.Arguments.QueryCountThreshold, commonutils.MaxQueryCountThreshold)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, check if the QueryCountThreshold is valid and log a warning whenever required. Refer here
} | ||
|
||
func getIndividualQueriesSamples(conn *performancedbconnection.PGSQLConnection, slowRunningQueries datamodels.SlowRunningQueryMetrics, gv *globalvariables.GlobalVariables, anonymizedQueriesByDB databaseQueryInfoMap, individualQueryMetricsForExecPlanList *[]datamodels.IndividualQueryMetrics, individualQueryMetricsListInterface *[]interface{}, versionSpecificIndividualQuery string) { | ||
query := fmt.Sprintf(versionSpecificIndividualQuery, *slowRunningQueries.QueryID, gv.DatabaseString, gv.Arguments.QueryResponseTimeThreshold, min(gv.Arguments.QueryCountThreshold, commonutils.MaxIndividualQueryCountThreshold)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, check if the QueryCountThreshold, QueryResponseTimeThreshold is valid and log a warning whenever required. Use default value when configured with -ve value. Refer here
@@ -31,6 +31,9 @@ type ArgumentList struct { | |||
CollectDbLockMetrics bool `default:"false" help:"If true, enables collection of lock metrics for the specified database. (Note: requires that the 'tablefunc' extension is installed)"` //nolint: stylecheck | |||
CollectBloatMetrics bool `default:"true" help:"Enable collecting bloat metrics which can be performance intensive"` | |||
ShowVersion bool `default:"false" help:"Print build information and exit"` | |||
EnableQueryMonitoring bool `default:"false" help:"Enable collection of detailed query performance metrics."` | |||
QueryResponseTimeThreshold int `default:"500" help:"Threshold in milliseconds for query response time. If response time exceeds this threshold, the query will be considered slow."` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is being used only while fetching individual query metrics. So instead of If response time exceeds this threshold, the query will be considered slow.
adding If response time for the idividual query exceeds this threshold, the individaul query is reported in metrics
or something similar makes sense?
# Threshold in milliseconds for query response time to fetch individual query performance metrics - Defaults to 500 | ||
# QUERY_RESPONSE_TIME_THRESHOLD : "500" | ||
|
||
# The number of records for each query performance metrics (maximum slow query metrics can be 20 and execution plan metrics can be 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of records to be retrieved?
if version < commonutils.PostgresVersion12 { | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking if the version is supported in all of the below functions we can have function CheckPostgresVersionSupport
and verify if the postgres version is supported only once in query_performance_main.go
file before calling populateQueryPerformanceMetrics
function. This will remove redundant checks.
func CheckPostgresVersionSupport(version uint64) bool{
if version < commonutils.PostgresVersion12 {
return false
}
return true
}
var query = fmt.Sprintf(versionSpecificSlowQuery, gv.DatabaseString, min(gv.Arguments.QueryCountThreshold, commonutils.MaxQueryCountThreshold)) | ||
rows, err := conn.Queryx(query) | ||
if err != nil { | ||
return nil, nil, err | ||
} | ||
defer rows.Close() | ||
for rows.Next() { | ||
var slowQuery datamodels.SlowRunningQueryMetrics | ||
if scanErr := rows.StructScan(&slowQuery); scanErr != nil { | ||
return nil, nil, err | ||
} | ||
slowQueryMetricsList = append(slowQueryMetricsList, slowQuery) | ||
slowQueryMetricsListInterface = append(slowQueryMetricsListInterface, slowQuery) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this part of code from line 24-36 is duplicated in blocking_sessions.go
, wait_evetn_metrics.go
files. Could we try to place this logic in a func and reuse it wherever needed?
rows, err := conn.Queryx(query) | ||
if err != nil { | ||
log.Error("Failed to execute query: %v", err) | ||
return nil, err | ||
} | ||
defer rows.Close() | ||
for rows.Next() { | ||
var blockingQueryMetric datamodels.BlockingSessionMetrics | ||
if scanError := rows.StructScan(&blockingQueryMetric); scanError != nil { | ||
return nil, scanError | ||
} | ||
// For PostgreSQL versions 13 and 12, anonymization of queries does not occur for blocking sessions, so it's necessary to explicitly anonymize them. | ||
if gv.Version == commonutils.PostgresVersion13 || gv.Version == commonutils.PostgresVersion12 { | ||
*blockingQueryMetric.BlockedQuery = commonutils.AnonymizeQueryText(*blockingQueryMetric.BlockedQuery) | ||
*blockingQueryMetric.BlockingQuery = commonutils.AnonymizeQueryText(*blockingQueryMetric.BlockingQuery) | ||
} | ||
blockingQueriesMetricsList = append(blockingQueriesMetricsList, blockingQueryMetric) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here could we have a common func and reuse it
rows, err := conn.Queryx(query) | ||
if err != nil { | ||
return nil, err | ||
} | ||
defer rows.Close() | ||
for rows.Next() { | ||
var waitEvent datamodels.WaitEventMetrics | ||
if waitScanErr := rows.StructScan(&waitEvent); waitScanErr != nil { | ||
return nil, err | ||
} | ||
waitEventMetricsList = append(waitEventMetricsList, waitEvent) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, could we have a common func and reuse it.
if version == commonutils.PostgresVersion12 || version == commonutils.PostgresVersion13 { | ||
return true, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment explaining why we are returing ture for these versions.
if waitErr != nil { | ||
return false, waitErr | ||
} | ||
pgStatExtension, statErr := isExtensionEnabled(conn, "pg_stat_statements") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already executed the same query in CheckSlowQueryMetricsFetchEligibility
by calling isExtensionEnabled(conn, "pg_stat_statements")
. can we do it only once?
if version == commonutils.PostgresVersion12 || version == commonutils.PostgresVersion13 { | ||
return true, nil | ||
} | ||
return isExtensionEnabled(conn, "pg_stat_statements") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already executed the same query in CheckSlowQueryMetricsFetchEligibility
by calling isExtensionEnabled(conn, "pg_stat_statements")
. can we do it only once?
PR to merge performance monitoring code from the fork to the branch
epic_db_query_performance_monitoring