-
Notifications
You must be signed in to change notification settings - Fork 728
feat: pipes and datasources for CDP aggs (CDP-804) #3714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventional Commits FTW!
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
| groupArrayDistinctMerge(activityTypesState) AS activityTypes, | ||
| groupArrayDistinctMerge(activeOnState) AS activeOn, | ||
| avgMerge(averageSentimentState) AS averageSentiment, | ||
| maxMerge(updatedAtState) AS updatedAt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: References non-existent column updatedAtState in queries
The query references maxMerge(updatedAtState) but this column doesn't exist in cdp_member_segment_aggregates_ds. The datasource schema defines lastActivityUpdatedAtState (an aggregate function) and updatedAt (a regular timestamp), not updatedAtState. This will cause the query to fail at runtime. The same issue occurs in all three nodes: leaf_segment_aggregates, parent_segment_aggregates, and grandparent_segment_aggregates.
Additional Locations (2)
| EXPORT_SCHEDULE @on-demand | ||
| EXPORT_FORMAT csv | ||
| EXPORT_STRATEGY @new | ||
| EXPORT_KAFKA_TOPIC memberSegmentsAggs_sink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Kafka topic name typo sends data to wrong topic
The backfiller sink exports to memberSegmentsAggs_sink (with extra 's') while all other member aggregate sinks export to memberSegmentsAgg_sink. This inconsistency will cause backfill data to be sent to a different Kafka topic than the incremental updates, potentially causing data to be lost or processed incorrectly by downstream consumers.
| SQL > | ||
| select distinct organizationId | ||
| from cdp_organization_segment_aggregates_ds | ||
| where updatedAt >= toStartOfDay(toTimeZone(now(), 'Europe/Berlin') - INTERVAL 1 DAY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing empty organizationId filter in parent segment pipe
The organizations_with_changed_aggs_previous_day node is missing the organizationId <> '' filter that exists in both the equivalent leaf segment pipe (line 17) and grandparent segment pipe (line 6). This inconsistency will cause rows with empty organizationId values to be exported to Kafka when processing parent segments, while they're correctly filtered out in the leaf and grandparent segment sinks.
Note
Introduces Tinybird datasources, materialized/copy pipes, and Kafka sinks to compute and export member and organization segment aggregates (leaf/parent/grandparent), including backfill and daily changed-segment jobs.
cdp_member_segment_aggregates_dsandcdp_organization_segment_aggregates_dsasAggregatingMergeTreetables with aggregate state columns and partition/sort keys.cdp_member_segment_aggregates_MVandcdp_organization_segment_aggregates_MVto build aggregate states from snapshot MV sources.cdp_member_aggregates_bucket_backfiller_sink.pipe) and organizations (cdp_organization_aggregates_bucket_backfiller_sink.pipe), computing aggregates for leaf/parent/grandparent segments and unioning results.memberSegmentsAgg_sinkandorganizationSegmentsAgg_sinktopics.Written by Cursor Bugbot for commit f3e7aa1. This will update automatically on new commits. Configure here.