You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m trying to validate the recommended architecture for a pipeline pattern that worked well
until a new requirement appeared, and get some feedback on whether our current setup still
makes sense or should be reconsidered.
Current setup (works fine so far)
We are using multiple code locations, separated by domain and team responsibilities:
ingestions: ETL assets (raw / system tables), owned by the data ingestion team
dbt: a dbt project exposed via @dbt_assets, owned by the BI / analytics team
This separation was intentional:
ingestion engineers work mostly in the ingestions code location
BI engineers work mostly in the dbt project
deployments and dependencies are isolated
execution is intentionally independent per code location
dbt models depend on ingestion assets via sources.yml and asset key mapping.
Asset lineage looks correct in the UI, and this setup has worked well so far.
We are also relatively early in our Dagster adoption.
New requirement (where things break down)
We now have a pipeline pattern that is always:
ingestion → dbt → ingestion
More concretely:
Ingest data into system.* tables
Run dbt models downstream
Perform a second ingestion/export step (e.g. FTP delivery) based on dbt results
At this point, coordinating execution across code locations becomes difficult:
Chaining jobs via run status sensors feels brittle and hard to reason about
We don’t really want multiple runs for what is conceptually a single pipeline
We also don’t want to move toward a monolithic, Airflow-style DAG
This use case is making us question whether our current architectural split
(ingestions + dbt as separate code locations) is still the right approach,
or whether this kind of cross-cutting pipeline suggests a different structure.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I’m trying to validate the recommended architecture for a pipeline pattern that worked well
until a new requirement appeared, and get some feedback on whether our current setup still
makes sense or should be reconsidered.
Current setup (works fine so far)
We are using multiple code locations, separated by domain and team responsibilities:
ingestions: ETL assets (raw / system tables), owned by the data ingestion teamdbt: a dbt project exposed via@dbt_assets, owned by the BI / analytics teamThis separation was intentional:
ingestionscode locationdbt models depend on ingestion assets via
sources.ymland asset key mapping.Asset lineage looks correct in the UI, and this setup has worked well so far.
We are also relatively early in our Dagster adoption.
New requirement (where things break down)
We now have a pipeline pattern that is always:
ingestion → dbt → ingestion
More concretely:
system.*tablesAt this point, coordinating execution across code locations becomes difficult:
This use case is making us question whether our current architectural split
(
ingestions+dbtas separate code locations) is still the right approach,or whether this kind of cross-cutting pipeline suggests a different structure.
We’re trying to balance:
Any guidance or confirmation of best practices would be greatly appreciated.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions