UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

📖 Overview • 📊 Benchmark Results • ⚙️ Setup • 📊 Data Preparation
🔧 Inference • 📃 Evaluation • 📝 Citation • 📧 Contact

📖 Overview

UNIKIE-BENCH is a unified benchmark designed to rigorously evaluate the Key Information Extraction (KIE) capabilities of Large Multimodal Models (LMMs) across realistic and diverse application scenarios.

Benchmark Results

🔥 Constrained-Category

🔥 Open-Category

⚙️ Setup

Install dependencies

conda create -n unikie python=3.10.19
pip install -r requirements.txt

📊 Data Preparation

Note: For detailed dataset processing instructions, please refer to DATA_PROCESS

The processed datasets will be saved in the datasets/ directory, with each folder containing:

label.json
qa.jsonl
images/

Run after processing:

./scripts/download_constrained_category.sh # Constrained Category Dataset

For Open Category, we provide a Google Drive link for download.

🔧 Inference

This section covers how to run inference with various models using OpenAI API.

Running Inference with OpenAI API

Use the scripts/run_openai_api.sh script to run inference on datasets, and you can also run inference directly using Python:

python src/request_openai.py [args]

args:

--dataset: Dataset name (e.g. "Medical-Services")
--model: Model name
--jsonl: Path to qa.jsonl file (default: datasets/<dataset>/qa.jsonl)
--output: Output jsonl path(default: results/<dataset>/result_<model_name>.jsonl)
--api-key: OpenAI API key
--api-base: OpenAI API base

You can use vLLM to deploy local models, for example:

CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --model xxx --served-model-name xxx --dtype=auto --tensor-parallel-size2 --trust_remote_code --gpu-memory-utilization 0.8 --api-key xxx

📃 Evaluation

After running inference, evaluate the results using the evaluation script.

Using the Evaluation Script

Use the scripts/eval.sh script to evaluate multiple models.

Specify the dataset name in the script (Keep consistent with the names in the datasets folder):

DATASETS=(xxx)

and run:

./scripts/eval.sh <MODEL_NAME>

You can also run evaluation directly using Python:

python src/evaluate_results.py [args]

OPTIONS args:

--pred: Prediction result JSONL file path (output from request_openai.py)
--dataset: Dataset name (e.g. "Medical-Services")
--output: Evaluation result output JSON file path (optional, default: <pred_file>_eval.json)

📝 Citation

If you find our work to be of value and helpful to your research, please acknowledge our contributions by citing us in your publications or projects:

@article{unikie2026,
  title={UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents},
  author={Yifan Ji and Zhipeng Xu and Zhenghao Liu and Zulong Chen and Qian Zhang and Zhibo Yang and Junyang Lin and Yu Gu and Ge Yu and Maosong Sun},
  journal={arXiv preprint arXiv:2602.07038},
  year={2026}
}

📄 License

This dataset is provided for academic research purposes only. The code in this repository is released under the MIT License. See the LICENSE file for details.

📧 Contact

If you have suggestions with the UniKIE benchmark, please contact us.

bigtailwolf001@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
datasets		datasets
datasets_process		datasets_process
figs		figs
scripts		scripts
src		src
DATA.md		DATA.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

📖 Overview

Benchmark Results

⚙️ Setup

📊 Data Preparation

🔧 Inference

Running Inference with OpenAI API

📃 Evaluation

Using the Evaluation Script

📝 Citation

📄 License

📧 Contact

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

📖 Overview

Benchmark Results

⚙️ Setup

📊 Data Preparation

🔧 Inference

Running Inference with OpenAI API

📃 Evaluation

Using the Evaluation Script

📝 Citation

📄 License

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages