Skip to content

NEUIR/UNIKIE-BENCH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

arXiv GitHub

📖 Overview 📊 Benchmark Results⚙️ Setup📊 Data Preparation
🔧 Inference 📃 Evaluation 📝 Citation📧 Contact

📖 Overview

UNIKIE-BENCH is a unified benchmark designed to rigorously evaluate the Key Information Extraction (KIE) capabilities of Large Multimodal Models (LMMs) across realistic and diverse application scenarios.

UniKIE Benchmark

Benchmark Results

🔥 Constrained-Category

Constrained Category Results

🔥 Open-Category

Open Category Results

⚙️ Setup

Install dependencies

conda create -n unikie python=3.10.19
pip install -r requirements.txt

📊 Data Preparation

Note: For detailed dataset processing instructions, please refer to DATA_PROCESS

The processed datasets will be saved in the datasets/ directory, with each folder containing:

  • label.json
  • qa.jsonl
  • images/

Run after processing:

./scripts/download_constrained_category.sh # Constrained Category Dataset 

For Open Category, we provide a Google Drive link for download.

🔧 Inference

This section covers how to run inference with various models using OpenAI API.

Running Inference with OpenAI API

Use the scripts/run_openai_api.sh script to run inference on datasets, and you can also run inference directly using Python:

python src/request_openai.py [args]

args:

  • --dataset: Dataset name (e.g. "Medical-Services")
  • --model: Model name
  • --jsonl: Path to qa.jsonl file (default: datasets/<dataset>/qa.jsonl)
  • --output: Output jsonl path(default: results/<dataset>/result_<model_name>.jsonl)
  • --api-key: OpenAI API key
  • --api-base: OpenAI API base

You can use vLLM to deploy local models, for example:

CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --model xxx --served-model-name xxx --dtype=auto --tensor-parallel-size2 --trust_remote_code --gpu-memory-utilization 0.8 --api-key xxx

📃 Evaluation

After running inference, evaluate the results using the evaluation script.

Using the Evaluation Script

Use the scripts/eval.sh script to evaluate multiple models.

Specify the dataset name in the script (Keep consistent with the names in the datasets folder):

DATASETS=(xxx)

and run:

./scripts/eval.sh <MODEL_NAME>

You can also run evaluation directly using Python:

python src/evaluate_results.py [args]

OPTIONS args:

  • --pred: Prediction result JSONL file path (output from request_openai.py)
  • --dataset: Dataset name (e.g. "Medical-Services")
  • --output: Evaluation result output JSON file path (optional, default: <pred_file>_eval.json)

📝 Citation

If you find our work to be of value and helpful to your research, please acknowledge our contributions by citing us in your publications or projects:

@article{unikie2026,
  title={UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents},
  author={Yifan Ji and Zhipeng Xu and Zhenghao Liu and Zulong Chen and Qian Zhang and Zhibo Yang and Junyang Lin and Yu Gu and Ge Yu and Maosong Sun},
  journal={arXiv preprint arXiv:2602.07038},
  year={2026}
}

📄 License

This dataset is provided for academic research purposes only. The code in this repository is released under the MIT License. See the LICENSE file for details.

📧 Contact

If you have suggestions with the UniKIE benchmark, please contact us.

bigtailwolf001@gmail.com

About

[ACL '26] Source code and datasets for our paper "UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents"

Resources

License

Stars

Watchers

Forks

Contributors