-
Notifications
You must be signed in to change notification settings - Fork 65
Description
Is your feature request related to a problem? Please describe.
The library makes several calls to the internet to load dependencies at runtime, e.g. huggingface and ocr libs. This is tricky to run in a closed environment as you don't know the issues up front.
Describe the solution you'd like
Provide a fully built distribution that contains all dependencies required up front.
Describe alternatives you've considered
The current way to achieve this is to run the library in a closed environment and chase dependencies. Assuming you have S3 access, you can use the partitioner (without ocr) locally by doing the following:
Step 1: Download model artifacts (from huggingface or somewhere you have access). The models you need to download are:
- https://huggingface.co/timm/resnet50.a1_in1k
- https://huggingface.co/Aryn/deformable-detr-DocLayNet
- https://huggingface.co/microsoft/table-transformer-structure-recognition-v1.1-all
Step 2: Update the (Aryn) deformable-detr-DocLayNet model's config.json with the following changes (to use local dependencies)
...
"backbone": "resnet50",
"backbone_kwargs": {
"pretrained_cfg": {
"file": "/{local_path}/models/resnet50.a1_in1k/pytorch_model.bin"
}
}
...
use_timm_backbone: true,
use_pretrained_backbone: true
...
Step 3: Use the local model paths in your script, e.g.
local_detr_path = "/{local_path}/models/deformable-detr-DocLayNet/"
local_table_struct_ext_path = "/{local_path}/models/table-transformer-structure-recognition-v1.1-all/"
table_structure_extractor=TableTransformerStructureExtractor(model=local_table_struct_ext_path)
partitioner = ArynPartitioner(local_detr_path,
use_partitioning_service=False,
extract_table_structure=True,
# use_ocr=True,
table_structure_extractor=table_structure_extractor,
threshold=0.3
)