Skip to content

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Nov 3, 2025

What does this PR do?

Fixes #41955.
Fixes an issue raised in #41954. Instead of setting attributes from kwargs after instantiating the image processor in from_pretrained, we update the image processor dict with the kwargs before instantiating the object. This allows custom logic in the init to take into account the custom kwargs passed to from_pretrained.

In the PR linked, the issue was that max_pixels is supposed to overwrite size["longest_edge"] when passed to the init, but in from_pretrained, max_pixels was never passed to the init and only set as an attribute after instantiating the image processor.

@ReinforcedKnowledge
Copy link

ReinforcedKnowledge commented Nov 3, 2025

I have a quick question about the:

        image_processor_dict.update(kwargs)
        image_processor = cls(**image_processor_dict)

Fix. Because you are adding user given kwargs to image_processor which is built by using the processor kwarg. And maybe the initializing of the class doesn't support those kwargs? (Not trying to criticize just trying to learn more about transformers).

I think the only way to check wether the kwargs are or are not in the init function is by inspecting its signature, but I don't think it's clean and it doesn't solve the underlying core pattern. Because processors can do whatever they want in their init, even use env vars for all they care while from_pretrained can't solve those issues for them.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan
Copy link
Member Author

I have a quick question about the:

    image_processor_dict.update(kwargs)
    image_processor = cls(**image_processor_dict)

Fix. Because you are adding user given kwargs to image_processor which is built by using the processor kwarg. And maybe the initializing of the class doesn't support those kwargs? (Not trying to criticize just trying to learn more about transformers)

No problem feel free to ask questions! The idea is that all the init logic/setting of attributes and checking of kwargs should be in the class init, to avoid having different results when loading models in different ways (from_pretrained, from_dict, direct instantiation etc.).
Fast image processors are becoming the default in the library, and they all include checks of kwargs in their init, to only accept kwargs that they should accept. So this fix should work without issues, as all fast image processors accept **kwargs in init, then filter them to set as attributes only the kwargs that are in "Model"ImageProcessorKwargs.

@ReinforcedKnowledge
Copy link

I agree, if you can enforce some kind of contract where the __init__ works with any **kwargs that the user passes then it's good. But I don't think it's easily doable. And also, the user might pass kwargs that are not used in the __init__ but used downstream in some other utility function of the processor, that checks wether an attribute exists, and then does some specific logic based on the existence of that attribute otherwise it does something else. But the __init__ might not care specifically about it. I don't know if my point is clear 🤔

And thank you for your openness!

@yonigozlan
Copy link
Member Author

Added logic to more clearly use cls.valid_kwargs to update the image processor dict ;).

And also, the user might pass kwargs that are not used in the init but used downstream in some other utility function of the processor, that checks wether an attribute exists, and then does some specific logic based on the existence of that attribute otherwise it does something else. But the init might not care specifically about it. I don't know if my point is clear 🤔

In that case, if the kwarg is in the valid_kwargs attribute of the processor class, it will still be set as an instance attribute (see the logic here in the base fast image processor class)

@ReinforcedKnowledge
Copy link

Oh ok perfect thank you @yonigozlan didn't know about valid_kwargs, that's exactly what I wanted to do initially but didn't know about them. Nice chatting with you!

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, good use of valid_kwargs but imo we should also get rid of the mutated kwargs!

Comment on lines +368 to +371
# Remove kwargs that are used to initialize the image processor attributes
for key in list(kwargs):
if hasattr(image_processor, key):
setattr(image_processor, key, value)
to_remove.append(key)
for key in to_remove:
kwargs.pop(key, None)
kwargs.pop(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, I'd prefer to not mutate the kwargs. Precedence will change after the first call, which can lead to weird situations (same class instance, two from_dict calls --> kwargs belong to same object but are changed between the 2 calls)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? I can't reproduce this issue on this branch. For example:

from transformers import Qwen2VLImageProcessorFast

kwargs = {"max_pixels": 200_000}
processor = Qwen2VLImageProcessorFast.from_dict({}, **kwargs)
print(kwargs)

prints
{'max_pixels': 200000}
even though "max_pixels" is popped

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: pix2struct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

max_pixels parameter ignored when loading Qwen2VL/Qwen3VL image processors via from_pretrained()

4 participants