Skip to content

chore(jobsdb): cache distinct parameters query result for all datasets except last #5752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 12, 2025

Conversation

Sidddddarth
Copy link
Contributor

@Sidddddarth Sidddddarth commented Apr 21, 2025

Description

Decrease the number of queries to find the distinct parameter values(source_id, destination_id, workspace_id).
Cache the results per dataset and only compute the results for the last dataset(sometimes more - right after migration/new ds creation).

The algorithm

Considering workspace_id as just another parameter.

Linear Ticket

Resolves PIPE-2046

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@Sidddddarth Sidddddarth requested a review from atzoum April 21, 2025 15:48
Copy link

codecov bot commented Apr 21, 2025

Codecov Report

Attention: Patch coverage is 88.55721% with 23 lines in your changes missing coverage. Please review.

Project coverage is 76.97%. Comparing base (bc52242) to head (8bc63ae).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
app/apphandlers/processorAppHandler.go 0.00% 13 Missing ⚠️
jobsdb/jobsdb.go 89.09% 4 Missing and 2 partials ⚠️
jobsdb/distinct_values_cache.go 94.52% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5752      +/-   ##
==========================================
- Coverage   77.02%   76.97%   -0.06%     
==========================================
  Files         491      493       +2     
  Lines       67615    67699      +84     
==========================================
+ Hits        52083    52113      +30     
- Misses      12706    12760      +54     
  Partials     2826     2826              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Sidddddarth Sidddddarth force-pushed the chore.cacheParametersQuery branch from 8bff521 to 8d0a868 Compare April 30, 2025 06:08
@Sidddddarth Sidddddarth changed the title chore: cache distinct parameters query result for all datasets except last chore(jobsdb): cache distinct parameters query result for all datasets except last Apr 30, 2025
@Sidddddarth Sidddddarth force-pushed the chore.cacheParametersQuery branch 2 times, most recently from 6a00157 to ae53e1a Compare May 5, 2025 08:24
@Sidddddarth Sidddddarth force-pushed the chore.cacheParametersQuery branch from ae53e1a to 8957d73 Compare May 5, 2025 08:44

func NewDistinctValuesCache() *distinctValuesCache {
return &distinctValuesCache{
cache: make(map[string]map[string][]string),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use sync.Map instead? It might be a good fit than having multiple locks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a sync.Map won't really help loading things once though

Copy link
Member

@cisse21 cisse21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find a way in which we can disable the cache functionality and go back to the older way of not caching. If there is any can you point me to that?

@Sidddddarth Sidddddarth force-pushed the chore.cacheParametersQuery branch from cc7db3f to 0507c81 Compare May 12, 2025 05:56
@Sidddddarth Sidddddarth merged commit e85811e into master May 12, 2025
61 checks passed
@Sidddddarth Sidddddarth deleted the chore.cacheParametersQuery branch May 12, 2025 08:10
satishrudderstack pushed a commit that referenced this pull request May 12, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.49.0-rc.1](v1.48.0...v1.49.0-rc.1)
(2025-05-12)


### Features

* add support for processing of upload_v2 job type by slave
([#5796](#5796))
([67988d1](67988d1))
* batch staging files for creating upload_v2 notifier jobs
([#5765](#5765))
([7dc3a45](7dc3a45))
* enable worker-based kafka client batching
([#5788](#5788))
([66dc19f](66dc19f))
* support claim renewal in notifier jobs
([#5818](#5818))
([a499d9c](a499d9c))
* support new consent resolution strategy values
([#5798](#5798))
([181b95b](181b95b))
* update config to add account details with destination in
workspaceConfig
([#5753](#5753))
([dbd46bf](dbd46bf))
* update contract for account and accountDefinition
([#5830](#5830))
([c704b07](c704b07))
* use account to decide oauth type of a destination
([#5810](#5810))
([9165e8c](9165e8c))


### Bug Fixes

* add check for nil secret on oauthv2
([#5807](#5807))
([fbc4abe](fbc4abe))
* convert bad request errors to 500 errors in oauth interceptor to
prevent panics
([#5813](#5813))
([3a08ec4](3a08ec4))
* **jobsdb:** completed datasets don't get deleted without a pair
([#5793](#5793))
([54aee71](54aee71))
* reporting common client path with query
([#5842](#5842))
([c068920](c068920))
* transformer client recycle ttl bound to connection idle timeout
([#5800](#5800))
([b13f92c](b13f92c))
* update account type to remove id which we are getting as map key
([#5835](#5835))
([bc52242](bc52242))
* ut mirroring race condition
([#5824](#5824))
([a4d579f](a4d579f))
* warehouse cached schema mismatch
([#5805](#5805))
([4656247](4656247))
* warehouse transformations mismatches
([#5779](#5779))
([01a7b83](01a7b83))


### Miscellaneous

* **deps:** bump github.com/snowflakedb/gosnowflake from 1.13.2 to
1.13.3 in the go_modules group
([#5787](#5787))
([41db33e](41db33e))
* **deps:** bump golangci/golangci-lint-action from 7 to 8
([#5815](#5815))
([bf3e808](bf3e808))
* **jobsdb:** cache distinct parameters query result for all datasets
except last
([#5752](#5752))
([e85811e](e85811e))
* migrate from denisenkom/go-mssqldb to microsoft/go-mssqldb
([#5776](#5776))
([dbd46bf](dbd46bf))
* revert synapse staging table with max varchar length
([#5817](#5817))
([2418329](2418329))
* **router:** support destination-specific configuration overrides for
all options
([#5841](#5841))
([695cf53](695cf53))
* synapse staging table with max varchar length
([#5775](#5775))
([2fc5384](2fc5384))
* update accountDefination type with authenticationType
([#5791](#5791))
([2a24e59](2a24e59))
* update rudder-go-kit to 0.49.2
([#5832](#5832))
([01a7b83](01a7b83))
* upload embedded dt response difference samples to s3
([#5792](#5792))
([c81001f](c81001f))
* warehouse transformer migration to embedded destination transformer
package
([#5827](#5827))
([e124bb0](e124bb0))
* warehouse transformer sample diff
([#5837](#5837))
([8af55de](8af55de))
* warehouse transformer uploader
([#5828](#5828))
([fc640bf](fc640bf))
* webhook integration test module upgrade
([#5665](#5665))
([f4130d1](f4130d1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
satishrudderstack pushed a commit that referenced this pull request May 13, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.49.0](v1.48.0...v1.49.0)
(2025-05-13)


### Features

* add dynamic config existence flag for destinations
([#5821](#5821))
([1b4ea06](1b4ea06))
* add support for processing of upload_v2 job type by slave
([#5796](#5796))
([67988d1](67988d1))
* batch staging files for creating upload_v2 notifier jobs
([#5765](#5765))
([7dc3a45](7dc3a45))
* enable worker-based kafka client batching
([#5788](#5788))
([66dc19f](66dc19f))
* support claim renewal in notifier jobs
([#5818](#5818))
([a499d9c](a499d9c))
* support new consent resolution strategy values
([#5798](#5798))
([181b95b](181b95b))
* update config to add account details with destination in
workspaceConfig
([#5753](#5753))
([dbd46bf](dbd46bf))
* update contract for account and accountDefinition
([#5830](#5830))
([c704b07](c704b07))
* use account to decide oauth type of a destination
([#5810](#5810))
([9165e8c](9165e8c))


### Bug Fixes

* add check for nil secret on oauthv2
([#5807](#5807))
([fbc4abe](fbc4abe))
* convert bad request errors to 500 errors in oauth interceptor to
prevent panics
([#5813](#5813))
([3a08ec4](3a08ec4))
* embedded transformations upload
([#5848](#5848))
([0b19968](0b19968))
* **jobsdb:** completed datasets don't get deleted without a pair
([#5793](#5793))
([54aee71](54aee71))
* reporting common client path with query
([#5842](#5842))
([c068920](c068920))
* transformer client recycle ttl bound to connection idle timeout
([#5800](#5800))
([b13f92c](b13f92c))
* update account type to remove id which we are getting as map key
([#5835](#5835))
([bc52242](bc52242))
* ut mirroring race condition
([#5824](#5824))
([a4d579f](a4d579f))
* warehouse cached schema mismatch
([#5805](#5805))
([4656247](4656247))
* warehouse transformations mismatches
([#5779](#5779))
([01a7b83](01a7b83))


### Miscellaneous

* **deps:** bump github.com/snowflakedb/gosnowflake from 1.13.2 to
1.13.3 in the go_modules group
([#5787](#5787))
([41db33e](41db33e))
* **deps:** bump golangci/golangci-lint-action from 7 to 8
([#5815](#5815))
([bf3e808](bf3e808))
* enrich event with bot details
([#5836](#5836))
([6035658](6035658))
* **jobsdb:** cache distinct parameters query result for all datasets
except last
([#5752](#5752))
([e85811e](e85811e))
* migrate from denisenkom/go-mssqldb to microsoft/go-mssqldb
([#5776](#5776))
([dbd46bf](dbd46bf))
* revert synapse staging table with max varchar length
([#5817](#5817))
([2418329](2418329))
* **router:** support destination-specific configuration overrides for
all options
([#5841](#5841))
([695cf53](695cf53))
* synapse staging table with max varchar length
([#5775](#5775))
([2fc5384](2fc5384))
* update accountDefination type with authenticationType
([#5791](#5791))
([2a24e59](2a24e59))
* update rudder-go-kit to 0.49.2
([#5832](#5832))
([01a7b83](01a7b83))
* upload embedded dt response difference samples to s3
([#5792](#5792))
([c81001f](c81001f))
* warehouse transformer migration to embedded destination transformer
package
([#5827](#5827))
([e124bb0](e124bb0))
* warehouse transformer sample diff
([#5837](#5837))
([8af55de](8af55de))
* warehouse transformer uploader
([#5828](#5828))
([fc640bf](fc640bf))
* webhook integration test module upgrade
([#5665](#5665))
([f4130d1](f4130d1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants