UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
📖 Overview •
📊 Benchmark Results •
⚙️ Setup •
📊 Data Preparation
🔧 Inference •
📃 Evaluation •
📝 Citation •
📧 Contact
UNIKIE-BENCH is a unified benchmark designed to rigorously evaluate the Key Information Extraction (KIE) capabilities of Large Multimodal Models (LMMs) across realistic and diverse application scenarios.
🔥 Constrained-Category (with Mindee API results)
🔥 Open-Category
Install dependencies
conda create -n unikie python=3.10.19
pip install -r requirements.txtNote: For detailed dataset processing instructions, please refer to DATA_PROCESS
The processed datasets will be saved in the datasets/ directory, with each
folder containing:
label.jsonqa.jsonlimages/
Run after processing:
./scripts/download_constrained_category.sh # Constrained Category Dataset For Open Category, we provide a Google Drive link for download.
This section covers how to run inference with various models using OpenAI API.
Use the scripts/run_openai_api.sh script to run inference on datasets, and you can also run inference directly using Python:
python src/request_openai.py [args]args:
--dataset: Dataset name (e.g. "Medical-Services")--model: Model name--jsonl: Path to qa.jsonl file (default:datasets/<dataset>/qa.jsonl)--output: Output jsonl path(default:results/<dataset>/result_<model_name>.jsonl)--api-key: OpenAI API key--api-base: OpenAI API base
You can use vLLM to deploy local models, for example:
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --model xxx --served-model-name xxx --dtype=auto --tensor-parallel-size2 --trust_remote_code --gpu-memory-utilization 0.8 --api-key xxxAfter running inference, evaluate the results using the evaluation script.
Use the scripts/eval.sh script to evaluate multiple models.
Specify the dataset name in the script (Keep consistent with the names in the datasets folder):
DATASETS=(xxx)
and run:
./scripts/eval.sh <MODEL_NAME>You can also run evaluation directly using Python:
python src/evaluate_results.py [args]OPTIONS args:
--pred: Prediction result JSONL file path (output from request_openai.py)--dataset: Dataset name (e.g. "Medical-Services")--output: Evaluation result output JSON file path (optional, default:<pred_file>_eval.json)
- Follow all repository set-up & data preparation steps.
- The
Commercialdataset is split by pages, reconstruct PDFs so that Mindee API can process each file as a whole.
python src/mindee/convert_commercial_dataset.pyOther datasets don't need any additional processing.
- We need to generate a Mindee compatible dataschema.json file for each file. This is the KIE schema that will be applied by the API.
python src/mindee/generate_dataschemas.py- Run the Mindee API on the all the datasets: You need a valid Mindee API key (get one here) and a model id. Take any model of your organization, the dataschema defining the fields to extract will be overidden at each API call with the generated dataschemas (see step above).
python src/mindee/run_inference.py --api-key=$MINDEE_API_KEY --model-id=ANY-MODEL-ID- Finally run the provided evaluation script on each dataset:
python src/evaluate_results.py --pred out/Administrative/base/predictions.jsonl --dataset AdministrativeAlternatively you can use the scripts/eval.sh to evaluate all datasets at once.
If you find our work to be of value and helpful to your research, please acknowledge our contributions by citing us in your publications or projects:
@article{unikie2026,
title={UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents},
author={Yifan Ji and Zhipeng Xu and Zhenghao Liu and Zulong Chen and Qian Zhang and Zhibo Yang and Junyang Lin and Yu Gu and Ge Yu and Maosong Sun},
journal={arXiv preprint arXiv:2602.07038},
year={2026}
}This dataset is provided for academic research purposes only. The code in this repository is released under the MIT License. See the LICENSE file for details.
If you have suggestions with the UniKIE benchmark, please contact us.
bigtailwolf001@gmail.com


