Skip to content

[Model Bringup] - openai/gpt-oss-20b #2744

@meenakshiramanathan1

Description

@meenakshiramanathan1

Model card - https://huggingface.co/blog/welcome-openai-gpt-oss
The transformers version will need to be updated accordingly to support this model , testing with transformer version upgraded to 4.55.0 and when model is loaded in fp32 or bfp16, following issue is arising

Traceback (most recent call last):
  File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 316, in _wrapper
    return func(*args, **kwargs)
  File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4879, in from_pretrained
    hf_quantizer.validate_environment(
  File "/opt/ttforge-toolchain/venv/lib/python3.10/site-packages/transformers/quantizers/quantizer_mxfp4.py", line 60, in validate_environment
    raise RuntimeError("Using MXFP4 quantized models requires a GPU")
RuntimeError: Using MXFP4 quantized models requires a GPU

Looks like the model is using MXFP4 quantization, which requires a GPU to run. But since we’re planning to use the model in BF16 or FP32 and don’t actually need quantization, we’re checking if we can just remove the quantization_config while loading model. That should let us load the model without triggering the MXFP4 GPU checks.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions