Skip to content

Support offline mode #707

@baitsguy

Description

@baitsguy

Is your feature request related to a problem? Please describe.
The library makes several calls to the internet to load dependencies at runtime, e.g. huggingface and ocr libs. This is tricky to run in a closed environment as you don't know the issues up front.

Describe the solution you'd like
Provide a fully built distribution that contains all dependencies required up front.

Describe alternatives you've considered
The current way to achieve this is to run the library in a closed environment and chase dependencies. Assuming you have S3 access, you can use the partitioner (without ocr) locally by doing the following:

Step 1: Download model artifacts (from huggingface or somewhere you have access). The models you need to download are:

  1. https://huggingface.co/timm/resnet50.a1_in1k
  2. https://huggingface.co/Aryn/deformable-detr-DocLayNet
  3. https://huggingface.co/microsoft/table-transformer-structure-recognition-v1.1-all

Step 2: Update the (Aryn) deformable-detr-DocLayNet model's config.json with the following changes (to use local dependencies)

...
"backbone": "resnet50", 
"backbone_kwargs": { 
   "pretrained_cfg": { 
       "file": "/{local_path}/models/resnet50.a1_in1k/pytorch_model.bin" 
    } 
}
...
use_timm_backbone: true,
use_pretrained_backbone: true
...

Step 3: Use the local model paths in your script, e.g.

local_detr_path = "/{local_path}/models/deformable-detr-DocLayNet/"
local_table_struct_ext_path = "/{local_path}/models/table-transformer-structure-recognition-v1.1-all/"
table_structure_extractor=TableTransformerStructureExtractor(model=local_table_struct_ext_path)
partitioner = ArynPartitioner(local_detr_path, 
                              use_partitioning_service=False,
                              extract_table_structure=True, 
                              # use_ocr=True,
                              table_structure_extractor=table_structure_extractor,
                              threshold=0.3
                             )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions