VisCoder2: Building Multi-Language Visualization Coding Agents

🌐 Project Page | 📖 arXiv | 🤗 VisCode-Multi-679K | 🤗 VisPlotBench | 🤗 VisCoder2

🔔 News

🔥 [2025-10-25] VisCode-Multi-679K, VisPlotBench and VisCode2 models are now publicly released! Check out our paper and Huggingface collections.

🧠 Introduction

VisCoder2 is an open-source family of multi-language visualization coding agents capable of iteratively generating, executing, rendering, and self-debugging visualization code.

This work addresses core challenges where existing models fail:

Limited language coverage
Unreliable code execution
Lack of iterative correction mechanisms

Unlike general code generation, visualization requires grounding across natural language, code, and rendered visual outputs.

To enable this, we introduce three complementary resources:

VisCode-Multi-679K:
A large-scale supervised dataset with 679K executable visualization samples and multi-turn correction dialogues across 12 programming languages, including Python, Vega-Lite, LaTeX, Mermaid, LilyPond, and more.
VisPlotBench:
A new benchmark spanning 8 languages and 13 visual categories, designed to systematically evaluate both initial code generation and multi-round self-debug capabilities.
VisCoder2:
The family of multi-language visualization models trained on VisCode-Multi-679K.

📊 Main Results on VisPlotBench

We evaluate VisCoder2 on VisPlotBench, our new benchmark for executable visualization code generation across 8 diverse languages.
The primary metric is Execution Pass Rate, which measures whether the code runs without error and produces a valid visual.

With iterative self-debug, VisCoder2-32B achieves an 82.4% overall execution pass rate, matching the performance of GPT-4.1 and significantly outperforming all open-source baselines.

🛠️ Training & Evaluation

We provide both the training dataset and evaluation benchmark for VisCoder2.

📦 Training is performed using the ms-swift framework with full-parameter supervised fine-tuning on our new VisCode-Multi-679K dataset.
📊 Evaluation is based on VisPlotBench, using a standardized execute–render–score pipeline that assesses models across 8 languages.
This includes a self-debug evaluation mode that allows models to revise failed generations over multiple rounds.

See the following folders for details:

train/ : Training scripts and configurations based on ms-swift
VisPlotBench/ : Evaluation framework for VisPlotBench

📬 Contact

Yuansheng Ni: [email protected]
Wenhu Chen: [email protected]

📖 Citation

BibTeX:

@article{ni2025viscoder2,
  title={VisCoder2: Building Multi-Language Visualization Coding Agents},
  author={Ni, Yuansheng and Cai, Songcheng and Chen, Xiangchao and Liang, Jiarong and Lyu, Zhiheng and Deng, Jiaqi and Zou, Kai and Nie, Ping and Yuan, Fei and Yue, Xiang and others},
  journal={arXiv preprint arXiv:2510.23642},
  year={2025}
}

@article{ni2025viscoder,
  title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation},
  author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu},
  journal={arXiv preprint arXiv:2506.03930},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
VisPlotBench		VisPlotBench
assets		assets
train		train
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisCoder2: Building Multi-Language Visualization Coding Agents

🔔 News

🧠 Introduction

📊 Main Results on VisPlotBench

🛠️ Training & Evaluation

📬 Contact

📖 Citation

About

Uh oh!

Releases

Packages

Languages

TIGER-AI-Lab/VisCoder2

Folders and files

Latest commit

History

Repository files navigation

VisCoder2: Building Multi-Language Visualization Coding Agents

🔔 News

🧠 Introduction

📊 Main Results on VisPlotBench

🛠️ Training & Evaluation

📬 Contact

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages