-
Notifications
You must be signed in to change notification settings - Fork 10
Transfer to v3 #1024
base: main
Are you sure you want to change the base?
Transfer to v3 #1024
Conversation
|
This PR will trigger a minor release when merged. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1024 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 710 710
=========================================
Hits 710 710 ☔ View full report in Codecov by Sentry. |
|
I believe that not every PR merged into |
|
Please run SQLFluff. |
|
Given the move to |
Sounds good, separate PR makes sense, no reason to combine here. My overarching goal is to ensure queries used by data.aem.live and queries used to feed spacecat and eventually DaaS -- especially those related to page view estimates -- are based on common logic so that we maintain consistency. I agree with you this is a separate topic. |
0effbdc to
c39dc76
Compare
| source, | ||
| COUNT(id) AS ids, | ||
| COUNT(DISTINCT url) AS pages, | ||
| APPROX_TOP_COUNT(url, 1)[OFFSET(0)].value AS topurl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see a slack thread started by @shsteimer in #aem-data-desk on Jan 3. I believe that grouping by source and taking the top url is hiding data. Might make more sense to group by source+url combination, or group by url and take the top source (referer).
trieloff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add _V4 in any of the built-in queries, make sure sqlfluff runs cleanly.
| ELSE TIMESTAMP_TRUNC(time, DAY) | ||
| END AS date | ||
| FROM helix_rum.PAGEVIEWS_V3( | ||
| FROM helix_rum.PAGEVIEWS_V4( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not create PAGEVIEWS_V4 and I object strongly to using it in any of the default queries as it is hiding data.
| MAX(weight) AS pageviews | ||
| FROM | ||
| `helix-225321.helix_rum.EVENTS_V4`( | ||
| net.host(@url), @offset, @interval, '-', '-', 'UTC', 'all', @domainkey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@urlshould have the same semantics as the rest of the@urlparameters, so URL prefix withouthttps://, optional path and$or?delimiters.- if you list
startdateandenddatein the supported parameters, you should also pass them through here - same for
timezone - same for
device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the same data should come from PAGEVIEWS_V4 – why does this query exist when rum-pageviews is there already?
A minimal set of data endpoints necessary to run https://data.aem.live/rum-dashboard
Also I updated rum-pageviews to use Events_V4; because the version that uses Events_V3 cannot resolve www.petplace.com to petplace.com
When we use Events_V4 which prepends www and checks for results for a url, www.petplace.com and petplace.com can appear.
@trieloff
@langswei
@dbrrt