Skip to content

ibm-granite/granite-vision-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

📚 Granite Vision Paper  | 📊 ChartNet CVPR 2026 Paper   | 🤗 HuggingFace Collection  | 💬 Discussions Page 

Granite Vision Models

Granite Vision is a family of multimodal vision‑language models designed to support enterprise‑grade document understanding tasks, including charts, tables, key‑value extraction, and structured image‑to‑text generation.
This repository provides documentation, examples, and pointers to available model releases and datasets.


🚀 Latest Release: Granite‑4.0‑3B‑Vision

Granite‑4.0‑3B‑Vision is a vision‑language model tailored for enterprise document data extraction, delivered as a LoRA adapter on top of Granite‑4.0‑Micro.

It supports:

  • Chart extraction — Chart‑to‑CSV, Chart‑to‑Summary, Chart‑to‑Code
  • Table extraction — JSON, HTML, and OTSL
  • Semantic KVP extraction — Schema‑guided extraction across diverse document layouts
  • Image‑to‑text — Natural‑language descriptions of images

Granite‑4.0‑3B‑Vision preserves and extends Granite Vision 3.3 capabilities while providing more specialized extraction workflows.


📊 ChartNet Dataset

ChartNet is a million‑scale multimodal dataset created to support robust chart understanding tasks:
➡️ https://huggingface.co/datasets/ibm-granite/ChartNet

It includes:

  • 1.7M synthetic charts with aligned images, code, tables, summaries, and reasoning
  • 94,643 human‑verified charts
  • 2,000 human‑verified test samples
  • 24 chart types, across 6 plotting libraries

ChartNet uses a code‑guided synthesis pipeline, producing tightly aligned visual, numerical, and textual components.
It was used during training for Granite‑4.0‑3B‑Vision.


📚 Legacy Granite Vision Models

Older Granite Vision models remain available for users who rely on earlier releases:


License

All Granite Vision Models are distributed under Apache 2.0 license.


Would you like to provide feedback?

Please let us know your comments about our family of language models by visiting our
Hugging Face model collection:
https://huggingface.co/collections/ibm-granite/granite-vision-models-67b3bd4ff90c915ba4cd2800

Select the model repository you would like to provide feedback about, go to the Community tab, and click New discussion.

Alternatively, you may also post questions or comments on our GitHub discussions page:
https://github.com/orgs/ibm-granite/discussions


Ethical Considerations and Limitations

The use of Large Vision and Language Models involves important risks, including bias, fairness concerns, misinformation, and challenges around autonomous decision‑making. Granite‑vision‑3.2‑2b is no exception.

Although alignment processes incorporate safety considerations, the model may sometimes produce inaccurate, biased, or unsafe responses. Smaller models in particular may exhibit increased susceptibility to hallucination, an active area of ongoing research.

We urge the community to deploy Granite Vision models responsibly, especially for document‑understanding tasks. More general vision tasks may carry higher risks of harmful or biased outputs.

To enhance safety, we recommend using Granite Vision models alongside Granite Guardian, a fine‑tuned model designed to detect and flag risks across dimensions from the IBM AI Risk Atlas.


Contributing

Issues and pull requests are welcome.
Please open a GitHub issue to report bugs or suggest enhancements.

``

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors