Problem in using unstructured workflows

Hi!
I am developing an ETL pipeline using [restate](https://github.com/restatedev/sdk-python) and Unstructured, using python. I have the following problem:
- I do not want to use Partition Endpoint because I also need the enrichment step.
- Using a workflow, I noticed that if a second user requests to process a document while the first one is still processing, there are two possible cases:
     - 1. the second request is quite close to the first (i.e., the first initiated job is still reading files from the source (in my case a folder on S3))
     - 2. the second request occurs during a later stage of the job: in this case the request is lost (a request consists of loading a new file into the source folder of S3 and then starting the workflow)
  
Since the first case is also problematic (if multiple users upload files almost simultaneously they will have to wait for all files to be processed before seeing their output, since unstructured first processes and then saves everything together), I came up with this solution:
- When a new request comes in, I create a source folder on S3, specific to that request.
- I create and start a workflow specific to that request, which I then destroy when finished
This way there should be no problem, do you have any better suggestions?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem in using unstructured workflows #254

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem in using unstructured workflows #254

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions