🌐 Project Page | 📖 arXiv | 🤗 VisCode-Multi-679K | 🤗 VisPlotBench | 🤗 VisCoder2
- 🔥 [2025-10-25] VisCode-Multi-679K, VisPlotBench and VisCode2 models are now publicly released! Check out our paper and Huggingface collections.
VisCoder2 is an open-source family of multi-language visualization coding agents capable of iteratively generating, executing, rendering, and self-debugging visualization code.
This work addresses core challenges where existing models fail:
- Limited language coverage
- Unreliable code execution
- Lack of iterative correction mechanisms
Unlike general code generation, visualization requires grounding across natural language, code, and rendered visual outputs.
To enable this, we introduce three complementary resources:
- VisCode-Multi-679K:
A large-scale supervised dataset with 679K executable visualization samples and multi-turn correction dialogues across 12 programming languages, including Python, Vega-Lite, LaTeX, Mermaid, LilyPond, and more.
- VisPlotBench:
A new benchmark spanning 8 languages and 13 visual categories, designed to systematically evaluate both initial code generation and multi-round self-debug capabilities.
- VisCoder2:
The family of multi-language visualization models trained on VisCode-Multi-679K.
We evaluate VisCoder2 on VisPlotBench, our new benchmark for executable visualization code generation across 8 diverse languages.
The primary metric is Execution Pass Rate, which measures whether the code runs without error and produces a valid visual.

With iterative self-debug, VisCoder2-32B achieves an 82.4% overall execution pass rate, matching the performance of GPT-4.1 and significantly outperforming all open-source baselines.
We provide both the training dataset and evaluation benchmark for VisCoder2.
- 📦 Training is performed using the ms-swift framework with full-parameter supervised fine-tuning on our new VisCode-Multi-679K dataset.
- 📊 Evaluation is based on VisPlotBench, using a standardized execute–render–score pipeline that assesses models across 8 languages.
This includes a self-debug evaluation mode that allows models to revise failed generations over multiple rounds.
See the following folders for details:
train/: Training scripts and configurations based on ms-swiftVisPlotBench/: Evaluation framework for VisPlotBench
- Yuansheng Ni: [email protected]
- Wenhu Chen: [email protected]
BibTeX:
@article{ni2025viscoder2,
title={VisCoder2: Building Multi-Language Visualization Coding Agents},
author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others},
journal={arXiv preprint arXiv:2510.23642},
year={2025}
}
@article{ni2025viscoder,
title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation},
author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu},
journal={arXiv preprint arXiv:2506.03930},
year={2025}
}