Deseret Alphabet OCR with CRNN

CRNN (Convolutional Recurrent Neural Network) implementation for transcribing Deseret Alphabet manuscript images to Unicode text. This implementation won first place in the BYU Hackathon.

Project Structure

deseret-ocr/
├── data/
│   ├── train/
│   │   ├── images/          # Training images
│   │   └── labels/          # Training labels (.txt files)
│   ├── test/
│   │   └── images/          # Test images (unlabeled)
├── src/
│   ├── dataset.py          # PyTorch Dataset classes
│   ├── model.py            # CRNN architecture
│   ├── train.py            # Training script
│   ├── inference.py        # Inference and submission generation
│   └── utils.py            # Helper functions
├── configs/
│   └── config.yaml         # Configuration file
├── models/                  # Saved model checkpoints
├── submissions/            # Generated submission files
├── requirements.txt
└── README.md

Setup

Install dependencies:

pip install -r requirements.txt

Organize your data according to the structure above.

Workflow

1. Configure Training

Edit configs/config.yaml to set:

Data paths
Preprocessing dimensions (based on your analysis)
Training hyperparameters

2. Train Model

python src/train.py --config configs/config.yaml

To resume training from a checkpoint:

python src/train.py --config configs/config.yaml --resume models/crnn_best.pth

3. Generate Output

python src/inference.py \
    --config configs/config.yaml \
    --checkpoint models/crnn_best.pth \
    --output submissions/submission.csv

Model Architecture

CRNN Components:

CNN Backbone: Extracts visual features from images
- 5 convolutional blocks with batch normalization
- MaxPooling to reduce spatial dimensions
- Output: Feature sequence representing horizontal positions
Bidirectional LSTM: Captures sequential context
- Reads features left-to-right and right-to-left
- Learns character dependencies and context
CTC Loss: Handles alignment between images and text
- No manual character segmentation required
- Automatically learns alignment during training

About CRNN and CTC (Connectionist Temporal Classification)

CRNN:

Handles variable-width line images naturally
No need to segment individual characters
Captures both local (CNN) and sequential (RNN) patterns

CTC:

Solves the alignment problem: image width ≠ text length
Allows model to output blanks and repeated characters
Example: "hh_eee_ll_oo" → "hello" (where _ = blank token)

Whitespace handling: CRNN with CTC can handle whitespace in images because:

The RNN learns to output blank tokens for whitespace regions
Training labels include space characters in the vocabulary
The model learns the visual pattern of word spacing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deseret Alphabet OCR with CRNN

Project Structure

Setup

Workflow

1. Configure Training

2. Train Model

3. Generate Output

Model Architecture

About CRNN and CTC (Connectionist Temporal Classification)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
models		models
src		src
submissions		submissions
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Deseret Alphabet OCR with CRNN

Project Structure

Setup

Workflow

1. Configure Training

2. Train Model

3. Generate Output

Model Architecture

About CRNN and CTC (Connectionist Temporal Classification)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages