🥇DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

The official PyTorch code for the project DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization.

Description

We present DocBinFormer, a novel two-level vision transformer (TL-ViT) architecture for document image binarization. The proposed encoder-decoder model employs a two-level transformer encoder to capture both global and local features from input images, enabling effective binarization of system-generated and handwritten document images.

Results

Step 1 - Download Code

Clone the repository to your desired location:

git clone https://github.com/RisabBiswas/DocBinFormer
cd DocBinFormer

Step 2 - Process Data

Data Path

The research and experiments are conducted on the DIBCO and H-DIBCO datasets. Find the dataset here - Link. After downloading, extract the folder named DIBCOSETS and place it in your desired data path. Means: /YOUR_DATA_PATH/DIBCOSETS/

Additional Data Path

PALM Dataset - Link
Persian Heritage Image Binarization Dataset - Link
Degraded Maps - Link

Data Splitting

Specify the data path, split size, validation, and testing sets to prepare your data. In this example, we set the split size as (256 X 256), the validation set as 2016, and the testing set as 2018 while running the process_dibco.py file.

python process_dibco.py --data_path /YOUR_DATA_PATH/ --split_size 256 --testing_dataset 2018 --validation_dataset 2016

Using DocBinFormer

Step 3 - Training

For training, specify the desired settings (batch_size, patch_size, model_size, split_size, and training epochs) when running the file train.py. For example, for a base model with a patch size of (16 X 16) and a batch size of 32, we use the following command:

python train.py --data_path /YOUR_DATA_PATH/ --batch_size 32 --vit_model_size base --vit_patch_size 16 --epochs 151 --split_size 256 --validation_dataset 2016

You will get visualization results from the validation dataset on each epoch in a folder named vis+"YOUR_EXPERIMENT_SETTINGS" (it will be created). In the previous case, it will be named visbase_256_16. Also, the best weights will be saved in the folder named "weights".

Step 4 - Testing on a DIBCO dataset

To test the trained model on a specific DIBCO dataset (should match the one specified in Section Process Data, if not, run process_dibco.py again). Use your own trained model weights. Then, run the below command. Here, I test on H-DIBCO 2017, using the base model with a 16X16 patch size and a batch size of 16. The binarized images will be in the folder ./vis+"YOUR_CONFIGS_HERE"/epoch_testing/

python test.py --data_path /YOUR_DATA_PATH/ --model_weights_path  /THE_MODEL_WEIGHTS_PATH/  --batch_size 16 --vit_model_size base --vit_patch_size 16 --split_size 256 --testing_dataset 2017

Acknowledgement

Our project has adapted and borrowed the code structure from DocEnTr. We are thankful to the authors! Additionally, we really appreciate the great work done on vit_pytorch by Phil Wang.

Corresponding Author -

Risab Biswas

Citation

If you use the DocBinFormer code in your research, we would appreciate a citation to the original paper:

@misc{biswas2023docbinformertwoleveltransformernetwork,
      title={DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization}, 
      author={Risab Biswas and Swalpa Kumar Roy and Ning Wang and Umapada Pal and Guang-Bin Huang},
      year={2023},
      eprint={2312.03568},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2312.03568}, 
}

Contact

If you have any questions, please feel free to reach out to Risab Biswas.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
DocBinFormer.png		DocBinFormer.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🥇DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Description

Results

Step 1 - Download Code

Step 2 - Process Data

Data Path

Additional Data Path

Data Splitting

Using DocBinFormer

Step 3 - Training

Step 4 - Testing on a DIBCO dataset

Acknowledgement

Corresponding Author -

Citation

Contact

About

Uh oh!

Releases

Packages

RisabBiswas/DocBinFormer

Folders and files

Latest commit

History

Repository files navigation

🥇DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Description

Results

Step 1 - Download Code

Step 2 - Process Data

Data Path

Additional Data Path

Data Splitting

Using DocBinFormer

Step 3 - Training

Step 4 - Testing on a DIBCO dataset

Acknowledgement

Corresponding Author -

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages