LORA Fine-tuned Qwen 2.5 7B vision-language model with the base model for chart understanding tasks. Finetuned adapted for pinpoint answer for question on ChartQA benchmark, acheiving 8% improvmeent. The model is trained on the ChartQA dataset to better understand and answer questions about charts and graphs.
Author: Prakash Chandra Chhipa
Portfolio: prakashchhipa.github.io
GitHub: AskAnythingInCharts-Qwen2.5-7B
- 🎨 Interactive Demo: HuggingFace Spaces
- 🤗 Model Card: HuggingFace Hub
| Model | ChartQA Accuracy | Improvement |
|---|---|---|
| Qwen 2.5 7B | 57.5% | - |
| Qwen 2.5 7B + LORA SFT | 66.0% | +8.5% |
- 🎨 Interactive Demo: HuggingFace Spaces
# 1. Clone the repository
git clone https://github.com/prakashchhipa/AskAnythingInCharts-Qwen2.5-7B.git
cd AskAnythingInCharts-Qwen2.5-7B
# 2. Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers datasets accelerate peft trl deepspeed
pip install pillow wandb gradio
# 3. Download ChartQA dataset
# You need to download the ChartQA dataset manually:
# - data/chartqa_train.json (training data)
# - data/images/ (chart images directory)
# Place these in the project root directory
# 4. Run the Gradio demo
python src/app_gradio.pyOpen your browser at http://localhost:7860
# 1. Set up environment (same as above)
# 2. Download ChartQA dataset (same as above)
# 3. Run training - cache will be built automatically
bash scripts/run_train_best_r64.shNote: The training script will automatically build the cache directory (cache/sft_chartqa_textvqa/) on first run. This may take some time but will speed up subsequent training runs.
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
# Load model
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct",
torch_dtype="bfloat16",
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA")
model = model.merge_and_unload()
processor = AutoProcessor.from_pretrained("prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA")
# Inference
image = Image.open("chart.png")
question = "What is the highest value in the chart?"
messages = [
{"role": "user", "content": [
{"type": "text", "text": question},
{"type": "image", "image": image}
]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# ... (process vision info and generate)This repository does NOT include the training data due to size constraints. You need to download the ChartQA dataset separately:
-
Download ChartQA Dataset:
- Visit the ChartQA GitHub repository
- Download the training data and images
- Place them in your project directory:
data/ ├── chartqa_train.json # Training annotations └── images/ # Chart images directory
-
Dataset Structure:
data/ ├── chartqa_train.json # ~50K training examples └── images/ # Chart images (PNG files) ├── train/ │ ├── 0000.png │ ├── 0001.png │ └── ... └── val/ ├── 0000.png └── ...
The training script uses a smart caching system to speed up subsequent runs:
- First Run: Automatically builds cache in
cache/sft_chartqa_textvqa/ - Subsequent Runs: Uses preprocessed cache for faster training
- Cache Contents: Preprocessed datasets, tokenized data, image features
- Cache Size: ~2-5GB (excluded from git repository)
Cache Management:
# Rebuild cache (if you want fresh preprocessing)
bash scripts/run_train_best_r64.sh --rebuild_cache
# Clear cache (if you want to start fresh)
rm -rf cache/For distributed training, set these environment variables:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NCCL_DEBUG=WARN
export NCCL_NVLS_ENABLE=0
export NCCL_IB_DISABLE=1
export NCCL_P2P_DISABLE=1The fine-tuned model shows better performance in:
| Category | Description |
|---|---|
| ✅ Concise Answers | Returns exact values without verbose explanations |
| ✅ Label Recognition | Better at reading text labels from charts |
| ✅ Color Identification | More accurate at identifying chart colors |
| ✅ Statistical Calculations | Improved at medians, ratios, differences |
| ✅ Counting | Better accuracy in counting chart elements |
| ✅ Region Comparison | Accurate comparisons across chart regions |
| ✅ Yes/No Questions | More reliable binary responses |
- Base Model: Qwen/Qwen2.5-VL-7B-Instruct (7B parameters)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Rank: 64
- Alpha: 16
- Target modules: Vision and language attention layers
- Dataset: ChartQA (chart understanding benchmark)
- Training Samples: ~50-200 samples per epoch
- Epochs: 6
- Learning Rate: 4e-5
- LoRA Rank: 64
- LoRA Alpha: 16
- Hardware: GPU with 16GB+ VRAM
- Framework: HuggingFace Transformers + PEFT + DeepSpeed
- Test Set: ChartQA validation set (500 examples)
- Metric: Exact Match (with normalization and numeric tolerance)
- Filtering: Only genuine improvements (excluded verbose-but-correct cases)
AskAnythingInCharts-Qwen2.5-7B/
├── src/ # Source code
│ ├── train_vlm_sft.py # Main training script
│ ├── datasets_build.py # Dataset building utilities
│ ├── app_gradio.py # Gradio demo interface
│ ├── agent_infer.py # Inference agent
│ ├── infer_cli.py # CLI inference tool
│ ├── export_merge_lora.py # LoRA export utilities
│ ├── prepare_data.py # Data preparation
│ └── ocr_tool.py # OCR utilities
├── scripts/
│ └── run_train_best_r64.sh # Training script with best config
├── configs/
│ ├── sft_config_rank64.json # Training configuration
│ └── ds_zero3.json # DeepSpeed configuration
├── evaluations/
│ └── eval_chartqa.py # Evaluation script
├── find_improved_examples.py # Find improved examples
├── filter_genuine_improvements.py # Filter genuine improvements
├── data/ # Dataset (not included in repo)
│ ├── chartqa_train.json # Training annotations
│ └── images/ # Chart images
├── cache/ # Preprocessed cache (not in repo)
│ └── sft_chartqa_textvqa/ # Cached datasets
├── outputs/ # Model outputs (not in repo)
│ └── qwen2_5_vl_7b_lora_rank64_e6/ # Trained model weights
└── README.md # This file
Note: data/, cache/, and outputs/ directories are excluded from the repository due to size constraints. They will be created automatically when you run the training script.
python evaluations/eval_chartqa.py \
--base_model Qwen/Qwen2.5-VL-7B-Instruct \
--adapter prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA \
--limit 500 \
--compare_bothpython find_improved_examples.py \
--base_model Qwen/Qwen2.5-VL-7B-Instruct \
--adapter prakashchhipa/Qwen2.5-VL-7B-ChartQA-LoRA \
--limit 500 \
--output_dir demo_chatqapython filter_genuine_improvements.py \
--input_dir demo_chatqa/improved \
--output_dir demo_genuine# Create a new directory for HF Space
mkdir chartqa-demo
cd chartqa-demo
# Copy necessary files
cp app.py requirements_demo.txt README_DEMO.md .
cp -r demo_genuine/ .
# Upload your adapter to HuggingFace Hub first, then update app.py:
# ADAPTER_PATH = "your-username/your-adapter-name"- Go to HuggingFace Spaces
- Click "Create new Space"
- Choose "Gradio" as SDK
- Upload your files
- Add
.envfile with model paths (if needed)
Create README.md in your Space with:
---
title: Chart Understanding with Fine-tuned Qwen2.5-VL
emoji: 📊
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
---This model is useful for:
- 📈 Business Analytics: Extract insights from business charts
- 📊 Data Analysis: Answer questions about data visualizations
- 📑 Report Processing: Understand charts in documents
- 🔬 Research: Analyze scientific plots and graphs
- 📱 Accessibility: Make charts accessible through Q&A
- 🤖 Automation: Automate chart data extraction
gradio>=4.0.0
torch>=2.0.0
transformers>=4.45.0
peft>=0.12.0
Pillow>=10.0.0
qwen-vl-utils>=0.0.8
accelerate>=0.20.0Hardware:
- Inference: GPU with 16GB+ VRAM (or CPU with patience)
- Training: GPU with 24GB+ VRAM recommended
Contributions are welcome! Areas for improvement:
- Add more chart types (scatter plots, heatmaps, etc.)
- Improve training data diversity
- Optimize inference speed
- Add multi-turn conversation support
- Create mobile-friendly interface
This project is released under the MIT License. The base Qwen2.5-VL model is subject to its own license terms.
- Base Model: Qwen Team for Qwen2.5-VL-7B
- Dataset: ChartQA benchmark
- Framework: HuggingFace Transformers and PEFT
- UI: Gradio for the interactive interface
If you use this model or code in your research, please cite:
@misc{chartqa-finetuned-qwen,
title={Fine-tuned Qwen2.5-VL-7B for Chart Understanding},
author={[Your Name]},
year={2025},
url={[Your Repo URL]}
}For questions or feedback:
- Open an issue on GitHub
Built with ❤️ using Qwen2.5-VL and HuggingFace Transformers
