Some questions about Auto Mixed Precision Training /expected scalar type Float but found Half #241

xxw11 · 2022-02-17T10:21:08Z

xxw11
Feb 17, 2022

1.当我使用如下配置时会发生错误
fp16 = dict(
mode=AMP_TYPE.NAIVE
)

但是我如果切换fp16的类型为AMP_TYPE.TORCH将会正常运行，
fp16 = dict( mode=AMP_TYPE.TORCH )
我不知道这可能是哪方面的原因？

2.似乎fp16和zero不能一起使用，
我会报如下错误
It is not allowed to set fp16 and zero configuration in your config file at the same time
但是当我注释掉关于fp16的设置会出现如下错误

下面是我的部分设置
`
BATCH_SIZE = 8
SEQ_LEN = 2048
NUM_EPOCHS = 50
TENSOR_PARALLEL = 4

zero = dict(
level=2,
dynamic_loss_scale=True,
overlap_comm=True,
clip_grad=1.0,
cpu_offload=False,

)

gradient_accumulation = 5

optimizer = dict(
type=SGD,
lr=0.00015,
weight_decay=1e-2,
)

loss = dict(
type=GPTLMLoss,
)

model = dict(
type=gpt2_Y,
checkpoint=True,
)

parallel = dict(
pipeline=1,
tensor=dict(size=TENSOR_PARALLEL, mode='2d'),
)
`

ver217 · 2022-02-18T08:12:18Z

ver217
Feb 18, 2022

Hi, we are trying to reproduce your problem. Could you tell the exact version of your colossalai?

4 replies

xxw11 Feb 18, 2022
Author

colossal version 0.0.1

ver217 Feb 18, 2022

Hi, we have fixed this bug since the 0.0.2 version. Could you give it a try? You can install from PYPI or install from source using the latest version.

xxw11 Feb 18, 2022
Author

Thank you so much. Now I can use AMP_TYPE.NAIVE successfully.

xxw11 Feb 18, 2022
Author

Dear developer:
But I still can't use ZeRO

xxw11 · 2022-02-20T06:54:28Z

xxw11
Feb 20, 2022
Author

Dear developer:
But I still can't use ZeRO

7 replies

xxw11 Feb 20, 2022
Author

FrankLeeeee, you are so nice, friendly and helpful.
After I tried, I found you are right. But I don't understand. Is this a feature?

FrankLeeeee Feb 20, 2022

ZeRO currently does not work with gradient accumulation well. We are trying to reconstruct zero for better speed and compatibility. We will release this in the next version.

FrankLeeeee Feb 20, 2022

I have created an issue in #247 , and we will update our progress accordingly. You can keep an eye on this issue if you want to :)

xxw11 Feb 20, 2022
Author

ColossalAI is the best HPC&AI framework I've ever used. It is easy but powerful. It's a really great job. I can't wait to see your next version.

FrankLeeeee Feb 20, 2022

Thanks for your compliment! Really happy to hear that and we will keep working on this 🚀

Some questions about Auto Mixed Precision Training /expected scalar type Float but found Half #241

Uh oh!

Uh oh!

xxw11 Feb 17, 2022

Replies: 2 comments · 11 replies

Uh oh!

ver217 Feb 18, 2022

Uh oh!

Uh oh!

xxw11 Feb 18, 2022 Author

Uh oh!

ver217 Feb 18, 2022

Uh oh!

xxw11 Feb 18, 2022 Author

Uh oh!

Uh oh!

xxw11 Feb 18, 2022 Author

Uh oh!

Uh oh!

xxw11 Feb 20, 2022 Author

Uh oh!

Uh oh!

xxw11 Feb 20, 2022 Author

Uh oh!

FrankLeeeee Feb 20, 2022

Uh oh!

Uh oh!

FrankLeeeee Feb 20, 2022

Uh oh!

xxw11 Feb 20, 2022 Author

Uh oh!

FrankLeeeee Feb 20, 2022

xxw11
Feb 17, 2022

Replies: 2 comments 11 replies

ver217
Feb 18, 2022

xxw11 Feb 18, 2022
Author

xxw11 Feb 18, 2022
Author

xxw11 Feb 18, 2022
Author

xxw11
Feb 20, 2022
Author

xxw11 Feb 20, 2022
Author

xxw11 Feb 20, 2022
Author