Skip to content

7B bloom cuda oom run step3 with 80 T4(16G) when hybridEngine is on, but ok when hybridengine is off #559

Open
@wang990099

Description

@wang990099

10 machines, 80 T4(16G)s, actor(7b bloom) with Zero3, critic(1.3b bloom) with Zero3, open both gradient_checkpointing

why HybridEngine using more gpu memory?

thank you in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions