fix: set cross_entropy ignore_index to PAD token index (#305) by ljluestc · Pull Request #443 · lukas-blecher/LaTeX-OCR

ljluestc · 2026-02-22T06:09:01Z

fix: align cross_entropy `ignore_index` with PAD token id (0)

Fixes #305

Problem

AutoregressiveWrapper from x-transformers computes the training loss via F.cross_entropy(..., ignore_index=self.ignore_index). The ignore_index parameter tells PyTorch which target value to skip during loss and gradient computation.

In pix2tex/models/transformer.py, the get_decoder() function constructs CustomARWrapper with pad_value=args.pad_token (which is 0), but does not pass ignore_index. This causes ignore_index to default to -100 (PyTorch's default), which means:

The [PAD] token (index 0) is incorrectly included in the cross-entropy loss.
Padding positions generate gradients that pollute model updates during training.
The model learns to predict PAD tokens rather than ignoring them, which can degrade convergence and final accuracy.

Root Cause

In pix2tex/models/transformer.py (lines 55–58), the original code was:

def get_decoder(args):
    return CustomARWrapper(
        TransformerWrapper(...),
        pad_value=args.pad_token)      # ← ignore_index missing, defaults to -100

pad_value=0 correctly pads input sequences, but the loss function doesn't know to ignore index 0 — it only ignores -100, which never appears in the vocabulary.

Fix

Pass ignore_index=args.pad_token alongside pad_value so that F.cross_entropy skips positions where the target is the PAD token:

def get_decoder(args):
    return CustomARWrapper(
        TransformerWrapper(
            num_tokens=args.num_tokens,
            max_seq_len=args.max_seq_len,
            attn_layers=Decoder(
                dim=args.dim,
                depth=args.num_layers,
                heads=args.heads,
                **args.decoder_args
            )),
        pad_value=args.pad_token,
        ignore_index=args.pad_token)   # ← NEW: PAD positions now excluded from loss

Files Changed

pix2tex/models/transformer.py — Added ignore_index=args.pad_token to the CustomARWrapper constructor call (1 line).
tests/test_ignore_index.py — New test file with 3 tests covering the fix (83 lines).

How to Test

Prerequisites

# Clone and checkout the branch
git clone git@github.com:ljluestc/LaTeX-OCR.git
cd LaTeX-OCR
git checkout fix/cross-entropy-ignore-index-305

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate

# Install the package with test dependencies
pip install -e ".[train]"
pip install pytest munch

Run the unit tests

pytest tests/test_ignore_index.py -v

Expected output

tests/test_ignore_index.py::test_ignore_index_matches_pad_token PASSED
tests/test_ignore_index.py::test_ignore_index_with_custom_pad_token PASSED
tests/test_ignore_index.py::test_pad_tokens_ignored_in_loss PASSED

What each test verifies

test_ignore_index_matches_pad_token — Constructs a decoder with pad_token=0 and asserts decoder.ignore_index == 0 (not -100).
test_ignore_index_with_custom_pad_token — Repeats the check for multiple pad token values (0, 3, 5) to ensure ignore_index always follows pad_token.
test_pad_tokens_ignored_in_loss — Builds two sequences that are identical (same real tokens, same padding), runs them through the decoder, and asserts the losses are equal — confirming padding doesn't affect the loss.

Manual verification (optional)

from munch import Munch
from pix2tex.models.transformer import get_decoder

args = Munch(num_tokens=100, max_seq_len=64, dim=64,
             num_layers=2, heads=2, decoder_args={"cross_attend": True},
             pad_token=0)
decoder = get_decoder(args)

print(f"pad_value:    {decoder.pad_value}")      # Expected: 0
print(f"ignore_index: {decoder.ignore_index}")    # Expected: 0 (was -100 before fix)

Risk Assessment

Scope: Only affects training loss computation. Inference/generation is unchanged.
Backward compatibility: No API changes. No config changes needed.
Pinned dependency: x_transformers==0.15.0 (in setup.py) — the AutoregressiveWrapper API has not changed, so the fix is compatible.
Side effects: None. This strictly corrects an existing bug where PAD tokens were erroneously influencing training.

References

…#305) The AutoregressiveWrapper defaults ignore_index to -100, but the [PAD] token has index 0. This caused padding tokens to incorrectly contribute to the cross-entropy loss and gradients during training. Pass ignore_index=args.pad_token alongside pad_value so that PAD positions are properly excluded from the loss computation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: set cross_entropy ignore_index to PAD token index (#305)#443

fix: set cross_entropy ignore_index to PAD token index (#305)#443
ljluestc wants to merge 1 commit into
lukas-blecher:mainfrom
ljluestc:fix/cross-entropy-ignore-index-305

ljluestc commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ljluestc commented Feb 22, 2026

fix: align cross_entropy ignore_index with PAD token id (0)

Problem

Root Cause

Fix

Files Changed

How to Test

Prerequisites

Run the unit tests

Expected output

What each test verifies

Manual verification (optional)

Risk Assessment

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: align cross_entropy `ignore_index` with PAD token id (0)