Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models

Please give us a star ⭐ if you find this work useful

News

2025.2.08 🚀 Released the paper arXiv.
2025.2.04 🚀 Released the codebase.

🐳 Model Zoo

Vision-Language Models (VLMs) 🖼️

Model Series	Model Name	Parameters	Architecture
LLaVA	LLaVA-v1.6-34B	34B	Vision-Language
	LLaVA-v1.6-13B	13B	Vision-Language
	LLaVA-v1.6-7B	7B	Vision-Language
Lightweight	MoE-LLaVA-Phi2	2.7B	Vision-Language
	MobileVLM-v2	7B	Vision-Language
Other VLMs	mPLUG-Owl2	7B	Vision-Language
	Qwen-VL-Chat	7B	Vision-Language
	Yi-VL	6B	Vision-Language
	CogAgent-VQA	7B	Vision-Language

Large Language Models (LLMs) 📚

Model Series	Model Name	Parameters	Architecture
Yi	Yi-34B	34B	Language
Qwen	Qwen-14B	14B	Language
	Qwen-7B	7B	Language
Llama-2	Llama-2-13B	13B	Language
	Llama-2-7B	7B	Language

📊 Evaluation

Key Improvements Over Baselines 🚀

Hallucination Detection: Up to 22.19% improvement in AUROC
Uncertainty Estimation: 21.17% boost in uncertainty-guided selective generation (AUARC)
Calibration: 70-85% reduction in calibration error
Coverage: Consistently meets 90% coverage target while reducing prediction set size

Benchmarks 🔖

for detailed results, please refer to the paper.

🤖 Getting started

6 groups of models could be launch from one environment: LLaVa, CogVLM, Yi-VL, Qwen-VL, internlm-xcomposer, MoE-LLaVA. This environment could be created by the following code:

python3 -m venv venv
source venv/bin/activate
pip install git+https://github.com/haotian-liu/LLaVA.git 
pip install git+https://github.com/PKU-YuanGroup/MoE-LLaVA.git --no-deps
pip install deepspeed==0.9.5
pip install -r requirements.txt
pip install xformers==0.0.23 --no-deps

mPLUG-Owl model can be launched from the following environment:

python3 -m venv venv_mplug
source venv_mplug/bin/activate
git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl2
git checkout 74f6be9f0b8d42f4c0ff9142a405481e0f859e5c
pip install -e .
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
cd ../../
pip install -r requirements.txt

Monkey models can be launched from the following environment:

python3 -m venv venv_monkey
source venv_monkey/bin/activate
git clone https://github.com/Yuliang-Liu/Monkey.git
cd ./Monkey
pip install -r requirements.txt
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
cd ../
pip install -r requirements.txt

To check all models you can run scripts/test_model_logits.sh

To work with Yi-VL:

apt-get install git-lfs
cd ../
git clone https://huggingface.co/01-ai/Yi-VL-6B

Model logits

To get model logits in four benchmarks run command from scripts/run.sh.

To train the abstention model with RL

bash scripts/train_all_models.sh

To evaluate the abstention model + uncertainty quantification benchmark

bash scripts/evaluate_policies.sh

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
configs		configs
data_utils		data_utils
input_utils		input_utils
models_utils		models_utils
prompt_utils		prompt_utils
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test_datasets.py		test_datasets.py
test_model.py		test_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models

Please give us a star ⭐ if you find this work useful

News

🐳 Model Zoo

Vision-Language Models (VLMs) 🖼️

Large Language Models (LLMs) 📚

📊 Evaluation

Key Improvements Over Baselines 🚀

Benchmarks 🔖

🤖 Getting started

Model logits

To train the abstention model with RL

To evaluate the abstention model + uncertainty quantification benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sinatayebati/vlm-uncertainty

Folders and files

Latest commit

History

Repository files navigation

Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models

Please give us a star ⭐ if you find this work useful

News

🐳 Model Zoo

Vision-Language Models (VLMs) 🖼️

Large Language Models (LLMs) 📚

📊 Evaluation

Key Improvements Over Baselines 🚀

Benchmarks 🔖

🤖 Getting started

Model logits

To train the abstention model with RL

To evaluate the abstention model + uncertainty quantification benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages