You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I am developing an ETL pipeline using restate and Unstructured, using python. I have the following problem:
I do not want to use Partition Endpoint because I also need the enrichment step.
Using a workflow, I noticed that if a second user requests to process a document while the first one is still processing, there are two possible cases:
the second request is quite close to the first (i.e., the first initiated job is still reading files from the source (in my case a folder on S3))
the second request occurs during a later stage of the job: in this case the request is lost (a request consists of loading a new file into the source folder of S3 and then starting the workflow)
Since the first case is also problematic (if multiple users upload files almost simultaneously they will have to wait for all files to be processed before seeing their output, since unstructured first processes and then saves everything together), I came up with this solution:
When a new request comes in, I create a source folder on S3, specific to that request.
I create and start a workflow specific to that request, which I then destroy when finished
This way there should be no problem, do you have any better suggestions?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi!
I am developing an ETL pipeline using restate and Unstructured, using python. I have the following problem:
Since the first case is also problematic (if multiple users upload files almost simultaneously they will have to wait for all files to be processed before seeing their output, since unstructured first processes and then saves everything together), I came up with this solution:
This way there should be no problem, do you have any better suggestions?
Beta Was this translation helpful? Give feedback.
All reactions