Some questions about Auto Mixed Precision Training /expected scalar type Float but found Half #241
xxw11
started this conversation in
Community | General
Replies: 2 comments 11 replies
-
Hi, we are trying to reproduce your problem. Could you tell the exact version of your |
Beta Was this translation helpful? Give feedback.
4 replies
-
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
1.当我使用如下配置时会发生错误
fp16 = dict(
mode=AMP_TYPE.NAIVE
)
但是我如果切换fp16的类型为AMP_TYPE.TORCH将会正常运行,
fp16 = dict( mode=AMP_TYPE.TORCH )
我不知道这可能是哪方面的原因?
2.似乎fp16和zero不能一起使用,

我会报如下错误
It is not allowed to set fp16 and zero configuration in your config file at the same time
但是当我注释掉关于fp16的设置会出现如下错误
下面是我的部分设置
`
BATCH_SIZE = 8
SEQ_LEN = 2048
NUM_EPOCHS = 50
TENSOR_PARALLEL = 4
zero = dict(
level=2,
dynamic_loss_scale=True,
overlap_comm=True,
clip_grad=1.0,
cpu_offload=False,
)
gradient_accumulation = 5
optimizer = dict(
type=SGD,
lr=0.00015,
weight_decay=1e-2,
)
loss = dict(
type=GPTLMLoss,
)
model = dict(
type=gpt2_Y,
checkpoint=True,
)
parallel = dict(
pipeline=1,
tensor=dict(size=TENSOR_PARALLEL, mode='2d'),
)
`
Beta Was this translation helpful? Give feedback.
All reactions