Skip to content

[Bug]: openai/gpt-oss-120b freeze during init_cache pass #12006

@tcherckez-nvidia

Description

@tcherckez-nvidia

System Info

H100

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python3 examples/auto_deploy/build_and_run_ad.py --model openai/gpt-oss-120b --args.yaml-extra examples/auto_deploy/model_registry/configs/dashboard_default.yaml --args.yaml-extra examples/auto_deploy/model_registry/configs/world_size_8.yaml --args.yaml-extra examples/auto_deploy/model_registry/configs/num_hidden_layers_5.yaml

Expected behavior

should pass

actual behavior

freeze for +30mins

additional notes

NA

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AutoDeploy/Dashboard<NV> AutoDeploy Backend: Dashboard relatedInference runtime<NV>General operational aspects of TRTLLM execution not in other categories.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions