Code an model for paper: "WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts" [ACL 2025 - Findings]
- The MCQs can be found in
Annotations/wikimixQA_MCQs.json. - Questions metadata (e.g., images, tables, etc.) is available on Hugging Face: WikiMixQA.
- The file
Annotations/qid_to_path.tsvmaps each question ID (QID) to its corresponding metadata folder (i.e., Hugging Face dataset.)
python scripts/evaluate_gpt.py \
--model-name gpt-4o \
--qa-type blind \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results
python scripts/evaluate_gemini.py \
--model-name gpt-4o \
--qa-type wikidoc \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results
python scripts/evaluate_gemini.py \
--model-name gpt-4o \
--qa-type oracle \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results
Install dependencies
pip install vllm
python scripts/evaluate_vllm.py \
--model-name "OpenGVLab/InternVL2-2B" \
--qa-type oracle \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results_1001 \
--qid-to-path Annotations/qid_to_path.tsv \
--num-gpus 8
Note that we limit the number of screenshot to the first 15 images for VRAM constraints. We can only compute it for 2B model.
python scripts/evaluate_vllm.py \
--model-name "OpenGVLab/InternVL2-2B" \
--qa-type wikidoc \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results \
--qid-to-path Annotations/qid_to_path.tsv
python scripts/evaluate_gemini.py \
--model-name "OpenGVLab/InternVL2-2B" \
--qa-type blind \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results \
--qid-to-path Annotations/qid_to_path.tsv
python scripts/evaluate_vllm.py \
--model-name "Qwen/Qwen2-VL-7B-Instruct" \
--qa-type oracle \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results \
--qid-to-path Annotations/qid_to_path.tsv \
--num-gpus 4
python scripts/evaluate_vllm.py \
--model-name "Qwen/Qwen2-VL-72B-Instruct" \
--qa-type oracle \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results \
--qid-to-path Annotations/qid_to_path.tsv \
--num-gpus 8
python scripts/evaluate_vllm.py \
--model-name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
--qa-type oracle \
--input-file Annotations/wikimixQA_MCQs.json \
--output-dir results \
--qid-to-path Annotations/qid_to_path.tsv \
--num-gpus 8
- Wikipedia HTML dump
- Language
File format: JSON list. Each line is a json object of
{
title: wikipedia title
wikidata: wikidata ID
url: the url that link to Wikipedia page
index: the index of table in the Wikipedia page
html: html content of table
caption: table caption
aspects: (Hierachy sections of Wikipedia)
}
python wtabhtml.py dump -l crpython wtabhtml.py gen-images -l cr -n 3Note: User can download our preprocessed dumps then, copy all {LANGUAGE}.jsonl.bz2 (the wikitables dump in PubTabNet format) to wtabhtml/data/models/wikitables_html_pubtabnet to generate photo images faster.
If user want to re-run all pipeline, the tool will download Wikipedia HTML dump, extract wikitables, and dump it to wtabhtml/data/models/wikitables_html_pubtabnet\{LANGUAGE}.jsonl.bz2 file as the following pipeline.
# Download dump
python wtabhtml.py download -l cr
# Parse dump and save json file
python wtabhtml.py parse -l cr
# Read dump
python wtabhtml.py read -l 1 -i ./data/models/cr.jsonl.bz2
# Generate images
python wtabhtml.py gen-images -l cr -n 3python scripts/download_images.pyConvert SVG to PNG
python scripts/convert_svg.pypython scripts/get_category.pypython scripts/merge_files.pyGet raw HTML pages for each article:
topic="Wikimedia"
subtopic="Person"
mkdir -p html/$topic/$subtopic
python scripts/get_wiki_html.py --input-file "data/${topic}_${subtopic}.json" --output-dir "html/$topic/$subtopic/" Generate the image from HTML pages:
python scripts/html2image.py --input-file "data/${topic}_${subtopic}.json" --output-dir "html/$topic/$subtopic/" Predicting whether the images are likely to be a chart based on the filenames:
python scripts/predict_chart_or_not.py --input-file "data/${topic}_${subtopic}.json" --output-file "data/${topic}_${subtopic}_chart.json"Compute statistics about the images
python scripts/category_image_stats.pypython scripts/extract_wiki_text.pypython scripts/move_images_to_folder.pypython scripts/move_tables_to_folder.pypython scripts/create_final_dataset.pypython scripts/split_large_images.py --input-dir finalpython scripts/extract_tables_from_html.py --input-dir finalFirst, install geckodriver:
wget https://github.com/mozilla/geckodriver/releases/download/v0.34.0/geckodriver-v0.34.0-linux64.tar.gz
tar -xvzf geckodriver-v0.34.0-linux64.tar.gzThen, install Firefox from APT:
sudo snap remove firefox
sudo add-apt-repository ppa:mozillateam/ppa
echo '
Package: *
Pin: release o=LP-PPA-mozillateam
Pin-Priority: 1001
' | sudo tee /etc/apt/preferences.d/mozilla-firefox
echo 'Unattended-Upgrade::Allowed-Origins:: "LP-PPA-mozillateam:${distro_codename}";' | sudo tee /etc/apt/apt.conf.d/51unattended-upgrades-firefox
sudo apt install firefoxFinally, generate the images
# add path to geckodriver
export PATH=$PATH:.
python scripts/gen_images_from_html_tables.py --input-dir finalGetting details for each articles:
topic="Economy"
subtopic="Stock market"
python scripts/predict_chart_information.py \
--data-dir "data" \
--output-dir "final" \
--topic $topic \
--subtopic $subtopic Getting details only for articles with charts:
python scripts/predict_chart_information.py \
--data-dir "data" \
--output-dir "final" \
--topic $topic \
--subtopic $subtopic \
--chart python scripts/table-to-text.py \
--input-dir "final" \
--checkpoint-dir "/mnt/datastore/models/meta-llama/Meta-Llama-3-8B-Instruct" \
--dtype fp16 \
--device "cuda:2"python scripts/compute_embeddings.py \
--input-dir "final" \
--model-name "BAAI/bge-reranker-v2-m3" \
--use-fp16 \
--device "cuda:2"If you use this code for your research, please cite our paper:
@inproceedings{foroutan-etal-2025-wikimixqa,
title={WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts},
author={Negar Foroutan and Angelika Romanou and Matin Ansaripour and Julian Martin Eisenschlos and Karl Aberer and Rémi Lebret},
year={2025},
url={https://arxiv.org/abs/2506.15594},
}