-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Is this your first time submitting a feature request?
- I have read the expectations for open source contributors
- I have searched the existing issues, and I could not find an existing issue for this feature
- I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Summary
Enable the ability to specify metric_time as part of a metric filter so the filter joins on the same granularity of the parent metric. This ensures metric filters respect the time context of the query.
Motivation
Currently, metric filters can reference a single entity or dimension, but they do not have a way to inherit the granularity of the query. This creates a core issue where the parent metric grouped by a time-granularity (e.g., mont), but the filter is based on "all time" data.
Calculating something like 'Product Usage by Paid Users group by Day/Week/Month' is not possible with the current filter implementation. SCDs are not a viable approach because they can't dynamically adjust to the query timeframe and add unnecessarily complex SQL logic to the compiled code.
By allowing metric_time to be included in metric filters, we can:
- Ensure that metric filters operate at the same time granularity as the metric being queried.
- Maintain backward compatibility by only applying this behavior when
metric_timeis explicitly included.
Proposed Behavior
- When
metric_timeis included in a metric filter group-by, the filter is applied using the same time granularity as the queried parent metric. - If
metric_timeis not specified in the metric filter, existing behavior remains unchanged. - If
metric_timeis included but the parent metric does not have a time dimension, it is silently ignored.
Use Case Example
Consider the question "How many SMS Customers are sending Emails by month":
- name: distinct_paid_sms_email_senders
label: SMS Customers sending Emails
type: derived
type_params:
expr: paid_sms_email_senders
metrics:
- name: distinct_paid_sms_accounts
- alias: paid_sms_email_senders
filter: |
{{ Metric('sum_emails_sent', group_by=['account_id']) }} > 0The data will change post-facto if an account starts sending emails because the filter is applied over all time. This yields a meaningless metric with historical values that constantly fluctuate.
By changing the filter to {{ Metric('sum_emails_sent', group_by=['account_id','metric_time']) }} > 0, this allows analytics to create a metric that is historically static and outputs the intended results of "How many Paid SMS accounts sent emails last June".
Describe alternatives you've considered
- SCD II would be an alternative approach, but adds a ton of complexity. It would also yield analytically unintuitive results, since the SCD would likely be at a smaller granularity. Entities are typically aggregated if they were active at any point during the month, (e.g., "Monthly Active Users") and an SCD may unintentionally filter values if the dimension is stored at a daily granularity.
- The other alternative is to hard-calculate the metric in an intermediate table. This does not scale and leads to conflicting results when you calculating common metric-patterns like "per paid user".
Who will this benefit?
This will benefit any analyst who is using the Semantic Layer to create dashboards or uses core metrics as part of query filters.
Are you interested in contributing this feature?
Yes - PR is forthcoming
Anything else?
No response