Skip to content

tuanio/image2latex

Repository files navigation

🖼️ Image to LaTeX | 🧠🖼️ ➜ 𝑒^{𝑥}\mathrm{e}^{x}

📘 Introduction

This repository implements a deep learning model to solve the Image-to-LaTeX task: converting images of mathematical formulas into their corresponding LaTeX code. Inspired by the work of Guillaume Genthial (2017), this project explores various encoder-decoder architectures based on the Seq2Seq framework to improve the accuracy of LaTeX formula generation from images.

🧠 Motivation: Many students, researchers, and professionals encounter LaTeX-based documents but lack the ability to extract and reuse formulas quickly. This project aims to automate the conversion of math formula images into editable LaTeX.

Image2Latex Diagram


🏗️ Model Architecture

Our model follows an Encoder–Decoder with Attention structure:

  • Encoder: Several configurations based on CNNs (Convolutional Neural Networks), sometimes combined with a Row Encoder (BiLSTM) or ResNet-18.
  • Decoder: A unidirectional LSTM network.
  • Attention: Luong attention is used to enhance decoding accuracy.

Supported Encoder Variants:

  • 🧱 Pure Convolution
  • 🧱 Convolution + Row Encoder (BiLSTM)
  • 🧱 Convolution + Batch Normalization
  • 🧱 ResNet-18
  • 🧱 ResNet-18 + Row Encoder (BiLSTM)

Architecture Diagram


📊 Dataset

  • A commonly used dataset for benchmarking Image-to-LaTeX models.
  • Preprocessed version available: im2latex-sorted-by-size

🚀 How to Run

Step 1: Install Requirements

Make sure you have the necessary packages installed.

pip install -r requirements.txt

Step 2: Setup Weights & Biases (Optional for logging)

wandb login <your-wandb-key>

Step 3: Training Example

python main.py \
    --batch-size 2 \
    --data-path C:\Users\nvatu\OneDrive\Desktop\dataset5\dataset5 \
    --img-path C:\Users\nvatu\OneDrive\Desktop\dataset5\dataset5\formula_images \
    --dataset 170k \
    --val \
    --decode-type beamsearch

✅ Results

From experiments using the IM2LATEX-100k dataset, the best-performing architecture was:

  • Convolutional Feature Encoder + BiLSTM Row Encoder
  • Achieved 77% BLEU-4 score

📌 Notebooks

Explore pre-trained model performance and evaluation in Kaggle Notebooks:


📈 Future Work

  • Integrating Transformer-based decoders
  • Exploring pretrained vision encoders like ViT
  • Improving performance on noisy or low-resolution images

📧 Author

Nguyễn Văn Anh Tuấn 📍 IUH - Industrial University of Ho Chi Minh City ✉️ [email protected]

Star History

Star History Chart