👋 Welcome to the official code of SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting (CVPR 2025)
This work explored semi-supervised text spotting (SSTS) to reduce the expensive annotation costs for text spotting. We observe two challenges in SSTS: 1) inconsistent pseudo labels between detection and recognition tasks, and 2) sub-optimal supervisions caused by inconsistency between teacher/student. Addressing them, we proposed SemiETS. It gradually generates reliable hierarchical pseudo labels for each task, thereby reducing noisy labels. Meanwhile, it extracts important information in text locations and transcriptions from bidirectional flows to improve consistency.
- Environment
Python 3.8 + Pytorch 1.9.0 + CUDA 11.1 + Detectron2 (v0.6) + ctcdecode
- Install SemiETS
# 1. Clone depository
git clone [email protected]:DrLuo/SemiETS.git
cd SemiETS
# 2. Create conda environment
conda create -n semiets python=3.8 -y
conda activate semiets
# 3. Install PyTorch and other dependencies using conda
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
python setup.py build develop
- Install ctcdecode from source
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode
pip install .
- Download datasets from here. Data splits are in
SemiETS/datasets.
Dataset Orgnization
Some image files need to be renamed. Organize them as follows (lexicon files are not listed here):
|- ./datasets
|- syntext1
| |- train_images
| └ annotations
| |- train_37voc.json
| └ train_96voc.json
|- syntext2
| |- train_images
| └ annotations
| |- train_37voc.json
| └ train_96voc.json
|- totaltext
| |- train_images
| |- test_images
| |- train_37voc.json
| |- train_96voc.json
| |- train_37voc_0.5_labeled.json
| |- train_37voc_0.5_unlabeled.json
| |- train_37voc_1_labeled.json
| |- train_37voc_1_unlabeled.json
| |- train_37voc_2_labeled.json
| |- train_37voc_2_unlabeled.json
| |- train_37voc_5_labeled.json
| |- train_37voc_5_unlabeled.json
| |- train_37voc_10_labeled.json
| |- train_37voc_10_unlabeled.json
| └ test.json
|- ic15
| |- train_images
| |- test_images
| |- train_37voc.json
| |- train_96voc.json
| |- train_37voc_0.5_labeled.json
| |- train_37voc_0.5_unlabeled.json
| |- train_37voc_1_labeled.json
| |- train_37voc_1_unlabeled.json
| |- train_37voc_2_labeled.json
| |- train_37voc_2_unlabeled.json
| |- train_37voc_5_labeled.json
| |- train_37voc_5_unlabeled.json
| |- train_37voc_10_labeled.json
| |- train_37voc_10_unlabeled.json
| └ test.json
|- ctw1500
| |- train_images
| |- test_images
| |- train_96voc.json
| |- train_96voc_0.5_labeled.json
| |- train_96voc_0.5_unlabeled.json
| |- train_96voc_1_labeled.json
| |- train_96voc_1_unlabeled.json
| |- train_96voc_2_labeled.json
| |- train_96voc_2_unlabeled.json
| |- train_96voc_5_labeled.json
| |- train_96voc_5_unlabeled.json
| |- train_96voc_10_labeled.json
| |- train_96voc_10_unlabeled.json
| └ test.json
|- evaluation
| |- gt_*.zip
- Download pretrained weights to for initialization from Google Drive
The checkpoints were pretrained using only Synth150K.
Place them under the folder ./output/R50/150k_tt/pretrain/.
python tools/train_semi.py --config-file ${CONFIG_FILE} --num-gpus 4 --dist-url 'auto'
For example:
python tools/train_semi.py --config-file configs/R_50/TotalText/SemiETS/SemiETS_2s.yaml --num-gpus 4 --dist-url 'auto'
The configuration files are named following the format: SemiETS_{DATA_PROPORTION}s.yaml
python tools/train_semi.py --config-file ${CONFIG_FILE} --eval-only MODEL.WEIGHTS ${MODEL_PATH}
If you find SemiETS useful for your research and applications, please cite using this BibTeX:
@article{luo2025semiets,
title={SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting},
author={Luo, Dongliang and Zhu, Hanshen and Zhang, Ziyang and Liang, Dingkang and Xie, Xudong and Liu, Yuliang and Bai, Xiang},
journal={CVPR},
year={2025}
}
This project is based on DeepSolo and Adelaidet. We appreciate their wonderful codebase. For academic use, this project is licensed under the 2-clause BSD License.
