Skip to content

Add support for new materialization to enable real-time modeling #136

Open
@leonard-henriquez

Description

@leonard-henriquez

Is your feature request related to a problem? Please describe.

In the current state of DBT Snowplow, if you want to get recent events, you need to run dbt run to process new data.
This package offers the "incremental" materialization option to process only new events and not every event with each run.
However, this approach still makes it challenging to have fresh data with low latency (<1 minute).

For instance, let's take an example:

  • 08:40 am: an event is triggered in a browser
  • 08:40 am: the Snowplow collector validates and enriches the event, then sends it to a stream
  • 08:41 am: the event is stored in the data warehouse
  • 08:45 am: a DBT job that runs every 5 minutes starts
  • 08:47 am: the DBT job finishes running my custom model (that depends on snowplow_web_base_events_this_run)

So, my data is only available at 08:47 am.
There are delays that are very hard to compress because we can't realistically run DBT jobs every second, and the DBT job takes a few minutes to complete.

Describe the solution you'd like

We could take advantage of the "lambda view" pattern and introduce a new materialization option that would benefit from materialized views and dynamic tables (for Snowflake).

Describe alternatives you've considered

Running DBT more frequently, but it's costly.

Are you interested in contributing towards this feature?

I am willing to help, but I am a newbie in DBT. I've tried to modify the materialization but didn't succeed in making it work.
However, I've found interesting resources that can help:

Metadata

Metadata

Assignees

No one assigned

    Labels

    category:modelsRelated to the models in the package.priority:lowNot on the roadmap.type:enhancementNew features or improvements to existing features.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions