We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
10 machines, 80 T4(16G)s, actor(7b bloom) with Zero3, critic(1.3b bloom) with Zero3, open both gradient_checkpointing
why HybridEngine using more gpu memory?
thank you in advance