-
Notifications
You must be signed in to change notification settings - Fork 330
chore(jobsdb): cache distinct parameters query result for all datasets except last #5752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5752 +/- ##
==========================================
+ Coverage 76.90% 76.96% +0.05%
==========================================
Files 491 493 +2
Lines 67183 67293 +110
==========================================
+ Hits 51667 51789 +122
+ Misses 12692 12686 -6
+ Partials 2824 2818 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8bff521
to
8d0a868
Compare
@@ -57,7 +57,7 @@ type workspaceStrategy struct { | |||
|
|||
// ActivePartitions returns the list of active workspaceIDs in jobsdb | |||
func (ws workspaceStrategy) ActivePartitions(ctx context.Context, db jobsdb.JobsDB) ([]string, error) { | |||
return db.GetActiveWorkspaces(ctx, ws.customVal) | |||
return db.GetDistinctParameterValues(ctx, jobsdb.WorkspaceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we filter out workspace ids that don't have any entry for that customVal?
router/isolation/isolation.go
Outdated
@@ -64,7 +64,7 @@ type workspaceStrategy struct { | |||
|
|||
// ActivePartitions returns the list of active workspaceIDs in jobsdb | |||
func (ws workspaceStrategy) ActivePartitions(ctx context.Context, db jobsdb.JobsDB) ([]string, error) { | |||
return db.GetActiveWorkspaces(ctx, ws.customVal) | |||
return db.GetDistinctParameterValues(ctx, jobsdb.WorkspaceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we filter out workspace ids that don't have any entry for that customVal?
UNION ALL | ||
(SELECT s.* FROM t, LATERAL( | ||
SELECT workspace_id FROM %[1]q f | ||
WHERE custom_val = '%[2]s' AND f.workspace_id > t.workspace_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should retain the option for filtering based on custom_val
// GetDistinctParameterValues returns the list of distinct parameter("source_id", "destination_id", "workspace_id") values inside the jobsdb, supporting to optionally filter results based on `custom_val`.
GetDistinctParameterValues(ctx context.Context, parameter ParameterName, customVal string) (values []string, err error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the pending event registry to filter?
Because if we cache based on a combination of parameter and custom_val, we'd end up making custom_val
number of queries for a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively the way destinationIDs are filtered using a maplookup in router, batch_rt could be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the second way, after one query the cache can help for upto two hours.
6a00157
to
ae53e1a
Compare
ae53e1a
to
8957d73
Compare
|
||
func NewDistinctValuesCache() *distinctValuesCache { | ||
return &distinctValuesCache{ | ||
cache: make(map[string]map[string][]string), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use sync.Map
instead? It might be a good fit than having multiple locks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find a way in which we can disable the cache functionality and go back to the older way of not caching. If there is any can you point me to that?
Description
Decrease the number of queries to find the distinct parameter values(source_id, destination_id, workspace_id).
Cache the results per dataset and only compute the results for the last dataset(sometimes more - right after migration/new ds creation).
The algorithm
Considering workspace_id as just another parameter.
This deprecates further filtering by custom_val when querying for workspaces.
Linear Ticket
Resolves PIPE-2046
Security