UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
📖 Overview •
📊 Benchmark Results •
⚙️ Setup •
📊 Data Preparation
🔧 Inference •
📃 Evaluation •
📝 Citation •
📧 Contact
UNIKIE-BENCH is a unified benchmark designed to rigorously evaluate the Key Information Extraction (KIE) capabilities of Large Multimodal Models (LMMs) across realistic and diverse application scenarios.
🔥 Constrained-Category
🔥 Open-Category
Install dependencies
conda create -n unikie python=3.10.19
pip install -r requirements.txtNote: For detailed dataset processing instructions, please refer to DATA_PROCESS
The processed datasets will be saved in the datasets/ directory, with each
folder containing:
label.jsonqa.jsonlimages/
Run after processing:
./scripts/download_constrained_category.sh # Constrained Category Dataset For Open Category, we provide a Google Drive link for download.
This section covers how to run inference with various models using OpenAI API.
Use the scripts/run_openai_api.sh script to run inference on datasets, and you can also run inference directly using Python:
python src/request_openai.py [args]args:
--dataset: Dataset name (e.g. "Medical-Services")--model: Model name--jsonl: Path to qa.jsonl file (default:datasets/<dataset>/qa.jsonl)--output: Output jsonl path(default:results/<dataset>/result_<model_name>.jsonl)--api-key: OpenAI API key--api-base: OpenAI API base
You can use vLLM to deploy local models, for example:
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --model xxx --served-model-name xxx --dtype=auto --tensor-parallel-size2 --trust_remote_code --gpu-memory-utilization 0.8 --api-key xxxAfter running inference, evaluate the results using the evaluation script.
Use the scripts/eval.sh script to evaluate multiple models.
Specify the dataset name in the script (Keep consistent with the names in the datasets folder):
DATASETS=(xxx)
and run:
./scripts/eval.sh <MODEL_NAME>You can also run evaluation directly using Python:
python src/evaluate_results.py [args]OPTIONS args:
--pred: Prediction result JSONL file path (output from request_openai.py)--dataset: Dataset name (e.g. "Medical-Services")--output: Evaluation result output JSON file path (optional, default:<pred_file>_eval.json)
If you find our work to be of value and helpful to your research, please acknowledge our contributions by citing us in your publications or projects:
@article{unikie2026,
title={UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents},
author={Yifan Ji and Zhipeng Xu and Zhenghao Liu and Zulong Chen and Qian Zhang and Zhibo Yang and Junyang Lin and Yu Gu and Ge Yu and Maosong Sun},
journal={arXiv preprint arXiv:2602.07038},
year={2026}
}This dataset is provided for academic research purposes only. The code in this repository is released under the MIT License. See the LICENSE file for details.
If you have suggestions with the UniKIE benchmark, please contact us.
bigtailwolf001@gmail.com


