Skip to content

[FEATURE] Concurrent page extraction #361

@gunnsth

Description

@gunnsth

Is your feature request related to a problem? Please describe.
Currently extraction only supports processing pages one by one. It might be more efficient to use multiple go-routines to handle page-by-page.

Describe the solution you'd like
Explore what the easiest way to support concurrency in extractor package is.

Describe alternatives you've considered
Alternative and currently the best way for concurrency is on a document basis. I.e. one go-routine handling a single document.

Additional context
Client's comment

We often deal with documents that are 900+ pages and serially processing these with Unidoc was. Taking a long time and this a lot of money in AWS expenses.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions