Skip to content
Discussion options

You must be logged in to vote

This was not enough, but setting

fp32_attention = False
train_batch_size = 4 # 32
eval_batch_size = 4
gradient_accumulation_steps = 8

it remains around 80% GPU usage.

Thanks!

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@mnslarcher
Comment options

Answer selected by mnslarcher
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants