-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Milestone
Description
Model card - https://huggingface.co/blog/welcome-openai-gpt-oss
The transformers version will need to be updated accordingly to support this model , testing with transformer version upgraded to 4.55.0 and when model is loaded in fp32 or bfp16, following issue is arising
Traceback (most recent call last):
File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model
model = model_class.from_pretrained(model, **kwargs)
File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
return model_class.from_pretrained(
File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 316, in _wrapper
return func(*args, **kwargs)
File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4879, in from_pretrained
hf_quantizer.validate_environment(
File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/quantizers/quantizer_mxfp4.py", line 60, in validate_environment
raise RuntimeError("Using MXFP4 quantized models requires a GPU")
RuntimeError: Using MXFP4 quantized models requires a GPU
Looks like the model is using MXFP4 quantization, which requires a GPU to run. But since we’re planning to use the model in BF16 or FP32 and don’t actually need quantization, we’re checking if we can just remove the quantization_config while loading model. That should let us load the model without triggering the MXFP4 GPU checks.
Metadata
Metadata
Assignees
Labels
No labels