Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
- 2025-05-15: ⭐ Our paper "Value-Spectrum" has been accepted to ACL 2025 main!
- 2024-12-19: 📄 Our paper "Value-Spectrum" is now available as a preprint on ArXiv! Read it here!
Stay tuned, we're working on the following:
- Upload Dataset to Huggingface
- Add Project Page
- Add Evaluation code
- Add VLM agent embedding in social media code
- Add Ablation study with human annotation code
We introduce Value-Spectrum, a benchmark designed to systematically evaluate preference traits in VLMs through visual content from social media based on Schwartz’s core human values.
- 🤝 Benevolence — caring for and helping others
- 🌍 Universalism — understanding, appreciation, and protection of all people and nature
- 🧭 Self-Direction — independent thought and action
- 🏆 Achievement — personal success through demonstrating competence
- 🎢 Stimulation — excitement, novelty, and challenge in life
- 🍰 Hedonism — pleasure and sensuous gratification
- 🛡️ Security — safety, harmony, and stability of society and relationships
- 📏 Conformity — restraint of actions that might upset others or violate social norms
- 🧧 Tradition — respect, commitment, and acceptance of cultural or religious customs
- 👑 Power — social status, prestige, and control over people and resources
Schwartz value-based image retrieval pipeline
Value-Spectrum utilizes VLM agents embedded within social media platforms (e.g. TikTok, Youtube, and etc) to collect a dataset of 50,191 unique short video screenshots spanning a wide range of topics, including lifestyle, technology, health, and more.
VLM agents pipeline for social media video screenshot collection and interaction
Overview of short video screenshots distribution of Value-Spectrum Dataset
Our study also shows that VLMs can effectively adopt specific personas and align their preferences with predefined roles, demonstrating their potential for role-playing tasks in social media environments. We validate two prompting strategies (Simple and ISQ), with ISQ significantly improving persona steerability and model adaptability.
Exploring Value-Driven Role-Playing in Vision-Language Models
We further contrast VLM outputs with those from text-only LLMs using image descriptions, offering insights into how modality influences value preferences and model behavior — whether visual cues meaningfully shift personality-like inclinations.
Value Distribution Comparison between VLMs and corresponding LLMs
If the paper, codes, or the dataset inspire you, please kindly cite us:
@inproceedings{Li2024ValueSpectrumQP,
title={Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts},
author={Jingxuan Li and Yuning Yang and Shengqi Yang and Linfan Zhang and Ying Nian Wu},
booktitle={Annual Meeting of the Association for Computational Linguistics},
year={2025},
}