Skip to content

Support offline mode #707

@baitsguy

Description

@baitsguy

Is your feature request related to a problem? Please describe.
The library makes several calls to the internet to load dependencies at runtime, e.g. huggingface and ocr libs. This is tricky to run in a closed environment as you don't know the issues up front.

Describe the solution you'd like
Provide a fully built distribution that contains all dependencies required up front.

Describe alternatives you've considered
The current way to achieve this is to run the library in a closed environment and chase dependencies. Assuming you have S3 access, you can use the partitioner (without ocr) locally by doing the following:

Step 1: Download model artifacts (from huggingface or somewhere you have access). The models you need to download are:

  1. https://huggingface.co/timm/resnet50.a1_in1k
  2. https://huggingface.co/Aryn/deformable-detr-DocLayNet
  3. https://huggingface.co/microsoft/table-transformer-structure-recognition-v1.1-all

Step 2: Update the (Aryn) deformable-detr-DocLayNet model's config.json with the following changes (to use local dependencies)

...
"backbone": "resnet50", 
"backbone_kwargs": { 
   "pretrained_cfg": { 
       "file": "/{local_path}/models/resnet50.a1_in1k/pytorch_model.bin" 
    } 
}
...
use_timm_backbone: true,
use_pretrained_backbone: true
...

Step 3: Use the local model paths in your script, e.g.

local_detr_path = "/{local_path}/models/deformable-detr-DocLayNet/"
local_table_struct_ext_path = "/{local_path}/models/table-transformer-structure-recognition-v1.1-all/"
table_structure_extractor=TableTransformerStructureExtractor(model=local_table_struct_ext_path)
partitioner = ArynPartitioner(local_detr_path, 
                              use_partitioning_service=False,
                              extract_table_structure=True, 
                              # use_ocr=True,
                              table_structure_extractor=table_structure_extractor,
                              threshold=0.3
                             )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions