GitHub - IITB-LEAP-OCR/TEXTRON: Data Programming for Text Detection in Documents using SPEAR

TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

Data Programming for Text Detection in Documents using CAGE. The work includes usage of CAGE from SPEAR to detect text within documents accurately, which can be used in creation of large benchmark datasets for Text detection task for any down stream tasks.

ABSTRACT

Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose \Textron, a {\em Data Programming-based approach}, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text.

Getting Started

Installation and Implementation

Install the dependencies using the following command:

pip install -r requriements.txt

Update the submodules using the following command:

git submodule update --init --recursive

Fix issues as per the following pull-request if any: decile-team/spear#20
Make the configurations as stated in config.py
1. Create a directory outside the main project directory, data, with a sub-directory temp
2. Within temp, create 2 sub-directories, img and txt
  - Place your input images in the img sub-directory and the corresponding ground truth labels (if available) in the txt sub-directory
    - Set the appropriate path for INPUT_DATA_DIR in config.py
    - In case ground truth isn't available, set GROUND_TRUTH_AVAILABLE within config.py as False
  - Choose the appropriate Labeling functions within config.py file from the lab_funcs list and also set the respective quality quide for CAGE

For Training

Set the PREDS_ONLY flag to False in config.py along with the appropriate parameters as per the config.py file
Run the main.py code to get the predictions in the results folder (outside the main project directory) defined in config.py

For Inference

Download the pickle file for LF parameters from here and add it in the directory as per the config.py file
Run the main.py code to get the predictions in the results folder (outside the main project directory) defined in config.py

For Very Fast Inference

Run the fast_inference.py code to get the predictions in the results folder (outside the main project directory) defined in config.py
It uses cage-predictions mapping with the existing CAGE model to generate predictions for the input image without internally using weak-supervision to aggregate weak labels

Methodology

Passing images to the CAGE model, which has several labeling functions which generate weak labels of pixel level information of Image data describing Textual or Non-textual information of the corresponding pixel
Usage of effective post processing steps to generate bounding boxes for the corresponding detected word level text
This method can be applied to documents of various settings and provides SOTA and near to SOTA results
Labeling functions could be used as a plug and play model to analyze results of different configurations

Labeling Functions

Pretrained Models based Labelling Functions
1. DocTR
Image Processing based Labelling Functions
1. Convex hull Labeling Function
1. Edges based Labeling Function
1. Contour based Labeling Function
1. Segmentation based Labeling Function
1. Mask Region based Labeling Function
1. Tesseract Model for Text Detection

Datasets

The Datasets could be found at this link

Alternatively, Data has been made available at Huggingface link

Results

Class	Coverage%	DBNet Model	Textron3LF	Textron4LF
Date	00.02%	33.34	100.00	66.67
Author	00.08%	76.40	75.79	77.78
Title	00.13%	77.94	26.29	58.54
Section	00.79%	57.08	61.30	66.13
List	00.86%	52.37	66.95	62.02
Abstract	01.34%	51.54	76.87	69.52
Footer	01.57%	54.67	72.10	67.64
Caption	02.38%	42.25	67.65	57.34
Table	04.83%	22.99	28.48	21.03
Equation	07.59%	2.86	18.31	11.71
Reference	10.09%	48.10	68.45	65.27
Paragraph	70.31%	49.98	68.36	63.68
Overall	100.00%	46.24	63.38	58.91

Textron results on classwise data of Docbank for 100 test images

Threshold	P	R	F1	P	R	F1	P	R	F1
0.5	40.49	74.03	52.35	90.49	80.00	84.92	87.23	84.46	85.82
0.6	29.63	54.16	38.30	76.45	67.59	71.75	79.97	77.43	78.68
0.7	13.21	24.15	17.08	43.56	49.27	46.24	64.42	62.38	63.38
0.8	04.62	08.44	05.97	18.82	16.64	17.67	33.63	32.56	33.09
0.9	00.36	00.65	00.46	03.11	02.75	02.92	33.63	32.56	33.09

TEXTRON yields a better overall performance and also shows significant improvement in detecting classes like equations and footers as compared to DBNet

References

License

The work has been licensed by GNU license

Acknowledgements

We wish to Acknowledge IITB annotators for annotating the Text Detection dataset to perform our experiments.
We acknowledge the support of a grant from IRCC, IIT Bombay, and MEITY, Government of India, through the National Language Translation Mission-Bhashini project.

Authors Contact Information

Badri Vishal Kasuba
Dhruv Kudale

Questions or Issues

we conclude with opening doors to more innovative contributions bringing about seamless multilingual text detection. Thank you for your interest in our research paper!

Citation

If you use this paper or the accompanying code/data in your research, please cite it as:

@InProceedings{TEXTRON,
    author    = {Dhruv Kudale and Badri Vishal Kasuba and Venkatapathy Subramanian and Parag Chaudhuri and Ganesh Ramakrishnan},
    title     = {TEXTRON: Weakly Supervised Multilingual Text Detection Through Data Programming},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {2871-2880},
    url       = {https://arxiv.org/abs/2402.09811}
}

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
metrics		metrics
spear @ e6a005a		spear @ e6a005a
src		src
utilities		utilities
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
calculate_average.py		calculate_average.py
fast_inference.py		fast_inference.py
main.py		main.py
requirements.txt		requirements.txt
text_classes.json		text_classes.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

ABSTRACT

Getting Started

Installation and Implementation

For Training

For Inference

For Very Fast Inference

Methodology

Labeling Functions

Datasets

Results

References

License

Acknowledgements

Authors Contact Information

Questions or Issues

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming

ABSTRACT

Getting Started

Installation and Implementation

For Training

For Inference

For Very Fast Inference

Methodology

Labeling Functions

Datasets

Results

References

License

Acknowledgements

Authors Contact Information

Questions or Issues

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages