cp: fix: transformers v5.5.0 validation (2010) into r0.4.0#2013
Merged
cp: fix: transformers v5.5.0 validation (2010) into r0.4.0#2013
fix: transformers v5.5.0 validation (2010) into r0.4.0#2013Conversation
* catch StrictDataclassClassValidationError Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * test: limit nightly to stepfun recipe for stepfun CI repro Temporarily prune nightly_recipes.yml to the single stepfun/step_3.5_flash_hellaswag_pp.yaml recipe to iterate on the StrictDataclassClassValidationError fix without paying for the full nightly matrix. Not intended for merge. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix: retry NeMoAutoTokenizer load when config trips layer_types validator AutoTokenizer.from_pretrained internally calls AutoConfig.from_pretrained to resolve the tokenizer class. For checkpoints whose config has layer_types longer than num_hidden_layers (e.g. stepfun-ai/Step-3.5-Flash), newer transformers rejects the config and huggingface_hub wraps the ValueError in StrictDataclassClassValidationError (not a ValueError subclass). The previous get_hf_config fix only covered the model-load path; the tokenizer path hit the same failure independently. On that specific validator failure, preload a config via get_hf_config (which truncates layer_types) and retry the tokenizer load with an explicit config=, bypassing the internal AutoConfig call. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * refactor: relax validate_layer_type globally instead of preloading config The previous tokenizer retry preloaded a fixed config via get_hf_config and re-entered AutoTokenizer.from_pretrained with an explicit config=. That round-trip is brittle (reconstructs a config the tokenizer does not use) and only fixes the tokenizer call site. Replace it with relax_layer_types_validator(): a one-shot monkey-patch that swaps PretrainedConfig.validate_layer_type with a no-op and rewrites the already-frozen validator entries in every live subclass's __class_validators__ list. After that, any downstream call that instantiates a config with mismatched layer_types/num_hidden_layers skips the check. The tokenizer retry now just applies the patch and re-invokes super().from_pretrained(...) with the original kwargs. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix: retry VLM AutoProcessor load on layer_types validation failure AutoProcessor.from_pretrained internally loads AutoConfig, so configs whose layer_types length differs from num_hidden_layers trip validate_layer_type through the processor path too. Previously the VLM build_dataloader caught the error under a broad except and silently set processor=None, producing a cryptic downstream failure. On the specific validator signature, call relax_layer_types_validator() and retry AutoProcessor.from_pretrained once. Unrelated exceptions keep the original fall-through to processor=None with a warning. LLM tokenizer path is already covered via NeMoAutoTokenizer. Also pass --force-exclude to the ruff pre-commit hooks so the tests/ exclusion already declared in pyproject.toml takes effect when pre-commit passes files explicitly. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * revert Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * revert Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Contributor
Author
|
/ok to test c748500 |
akoumpa
approved these changes
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
beep boop [🤖]: Hi @akoumpa 👋,