Best practice to generate more dynamic downstream partitions from one asset #33247
Unanswered
ArcticXWolf
asked this question in
Q&A
Replies: 1 comment
-
|
I've found a partial solution. We can add to the xml_partition in the materialization of the xml_collection asset and this will trigger the correct xml materializations. However we now run into this issue of the MultiPartition being full cross product: #13139 (comment) So I dont see how this usecase is currently feasible in dagster :( |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
we have a usecase where we get xml files delivered from customers and need to transform them into individual reports (per FILE, not per customer). The xml files get delivered as a zip file and we need to be able to reprocess such a file from time to time (for example for a backfill).
I tried for a long time, but it seems like dagster does not support this kind of usecase very well or I cannot figure it out from the documentation.
My idea was the following assets:
The xml_collection should be materialized by a sensor (that tracks the ingestion spot (SFTP)), but the others should be materialized via Declarative Automation (so they should automaterialize as soon as a collection arrives).
I was able to create the xml_collection AssetSpec, its corresponding partition and a sensor that creates an asset materialization for the xml_collection.
However now I want to write the materialization of the xml assets from that collection and I do not know any way of creating multiple asset materializations from one single asset.
Is dagster even the correct tool for this usecase? I've seen the discussion at #28678 and at #9559 but they have no solution that solves my setup.
Just some better example in code:
With that above, I know need to do something like:
But I dont know how to fill the materializations there (since I need to create MULTIPLE xml asset materializations from ONE xml_collection materialization).
Also the reason why we dont combine all xmls per delivery into one
xmlsasset is because every single one of them is allowed to fail and not continue. A different example that shows this well would be getting a delivery of PDFs and having to do OCR on them.Beta Was this translation helpful? Give feedback.
All reactions