Skip to content

output_hidden_states is not working correctly with SiglipModel class #42759

@hamzaakyildiz

Description

@hamzaakyildiz

System Info

  • transformers version: 4.57.3
  • Platform: Linux-6.6.105+-x86_64-with-glibc2.35
  • Python version: 3.12.12
  • Huggingface_hub version: 0.36.0
  • Safetensors version: 0.7.0
  • Accelerate version: 1.12.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.9.0+cu126 (CUDA)
  • Tensorflow version (GPU?): 2.19.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.10.7 (gpu)
  • Jax version: 0.7.2
  • JaxLib version: 0.7.2
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: No
  • GPU type: Tesla T4

Who can help?

@zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Problem

Even though I pass the output_hidden_states=True to the vision and text model through the config, it seems that the SiglipModel class does not successfully cascade this config to the vision and text model.

Reproduction Code

import torch
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel, AutoConfig

model_path = "google/siglip-base-patch16-224"

config = AutoConfig.from_pretrained(model_path)
config.output_hidden_states = True
config.vision_config.output_hidden_states = True
config.text_config.output_hidden_states = True

model = AutoModel.from_pretrained(
    "google/siglip-base-patch16-224",             
    config=config
)
processor = AutoProcessor.from_pretrained("google/siglip-base-patch16-224")


images = [Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)]
texts = ["a photo of 2 cats"]
inputs = processor(text=texts, images=images, padding="max_length", return_tensors="pt")

with torch.no_grad():
    outputs = model.forward(**inputs)

print(outputs.text_model_output.keys()) # odict_keys(['last_hidden_state', 'pooler_output'])

Expected behavior

It should have returned hidden_states also as i already stated in the config file when i am using SiglipModel class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions