📚 Granite Vision Paper | 📊 ChartNet CVPR 2026 Paper | 🤗 HuggingFace Collection | 💬 Discussions Page
Granite Vision is a family of multimodal vision‑language models designed to support enterprise‑grade document understanding tasks, including charts, tables, key‑value extraction, and structured image‑to‑text generation.
This repository provides documentation, examples, and pointers to available model releases and datasets.
Granite‑4.0‑3B‑Vision is a vision‑language model tailored for enterprise document data extraction, delivered as a LoRA adapter on top of Granite‑4.0‑Micro.
It supports:
- Chart extraction — Chart‑to‑CSV, Chart‑to‑Summary, Chart‑to‑Code
- Table extraction — JSON, HTML, and OTSL
- Semantic KVP extraction — Schema‑guided extraction across diverse document layouts
- Image‑to‑text — Natural‑language descriptions of images
Granite‑4.0‑3B‑Vision preserves and extends Granite Vision 3.3 capabilities while providing more specialized extraction workflows.
ChartNet is a million‑scale multimodal dataset created to support robust chart understanding tasks:
➡️ https://huggingface.co/datasets/ibm-granite/ChartNet
It includes:
- 1.7M synthetic charts with aligned images, code, tables, summaries, and reasoning
- 94,643 human‑verified charts
- 2,000 human‑verified test samples
- 24 chart types, across 6 plotting libraries
ChartNet uses a code‑guided synthesis pipeline, producing tightly aligned visual, numerical, and textual components.
It was used during training for Granite‑4.0‑3B‑Vision.
Older Granite Vision models remain available for users who rely on earlier releases:
-
Granite Vision 3.3 (2B)
https://huggingface.co/ibm-granite/granite-vision-3.3-2b -
Granite Vision 3.1 (2B Preview)
https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview -
Granite Vision 3.3 (GGUF‑converted)
https://huggingface.co/ibm-granite/granite-vision-3.3-2b-GGUF
All Granite Vision Models are distributed under Apache 2.0 license.
Please let us know your comments about our family of language models by visiting our
Hugging Face model collection:
https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800
Select the model repository you would like to provide feedback about, go to the Community tab, and click New discussion.
Alternatively, you may also post questions or comments on our GitHub discussions page:
https://github.com/orgs/ibm-granite/discussions
The use of Large Vision and Language Models involves important risks, including bias, fairness concerns, misinformation, and challenges around autonomous decision‑making. Granite‑vision‑3.2‑2b is no exception.
Although alignment processes incorporate safety considerations, the model may sometimes produce inaccurate, biased, or unsafe responses. Smaller models in particular may exhibit increased susceptibility to hallucination, an active area of ongoing research.
We urge the community to deploy Granite Vision models responsibly, especially for document‑understanding tasks. More general vision tasks may carry higher risks of harmful or biased outputs.
To enhance safety, we recommend using Granite Vision models alongside Granite Guardian, a fine‑tuned model designed to detect and flag risks across dimensions from the IBM AI Risk Atlas.
Issues and pull requests are welcome.
Please open a GitHub issue to report bugs or suggest enhancements.
``