Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Introduction

This repository is the official implementation of Group Position Embedding. Group Position Embedding (GPE) is a novel and efficient technique to enhance the layout understanding ca-pabilities of LLMs without architectural changes or additional pre-training. GPE achieves this by strategically grouping the attention heads and feeding each group with distinct positional embeddings, effectively encoding layout information relevant to document comprehension. For more details, please refer to our paper:

News

2025.02.06 We release the dataset.
2025.02.05 We release the paper.

Performance

Environment

Inference

python ./inference.py --model_path MODEL_PATH  --image_path IMAGE_PATH  --question "YOUR_QUESTION"

Eval

Run eval.py and input the directory containing prediction files.

python --input_dir EVAL_PATH

Dataset Descriptions

Forms is an English tabular dataset sourced from Fetaqa.

Websites is a Chinese tabular dataset obtained by web crawlers.

Slides is a Chinese tabular dataset sourced from PowerPoint. Each cell may have an irregular representation, such as line breaks or complex symbols.

SynthTables is our synthetic tabular dataset, derived from entity names and values extracted from public entity extraction datasets. It is randomly rotated and discarded to enhance the difficulty of the task.

Newspapers is a Chinese dataset sourced from M6doc newspapers, which contains multiple columns of text information. The layout includes complex information such as horizontal, vertical, and pictures.

SynthDocs is our multi-column document dataset generated by SynthDoc, based on public MRC datasets. It disrupts the typography structure of the text, allowing semantically coherent text to be divided into multi-column document layouts.

Citation

If you find GPE useful in your research, please cite the following paper:

@inproceedings{Zhu2025GroupPosition,
  title   = {Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information},
  author  = {Zhu Yuke and Zhang Yue and Liu Dongdong and Xie Chi and Xiong Zihu and Zheng Bo and Guo Sheng},
  journal = {Proceedings of the 2025 International Conference on Learning Representations},
  year    = {2025},
}

Contact Us

For help or issues using GPE, please submit a GitHub issue.

For other communications related to GPE, please contact Yuke Zhu ([email protected]), Sheng Guo ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
GPE-qwen2-model		GPE-qwen2-model
model_utils		model_utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.py		config.py
eval.py		eval.py
infer.py		infer.py
metric.py		metric.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Introduction

News

Performance

Environment

Inference

Eval

Dataset Descriptions

Citation

Contact Us

About

Releases

Packages

Languages

antgroup/GroupPositionEmbedding

Folders and files

Latest commit

History

Repository files navigation

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Introduction

News

Performance

Environment

Inference

Eval

Dataset Descriptions

Citation

Contact Us

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages