Description
Hello,
I notice that DeepSpeed-Chat claims Codegen is supported up to 16B, but from previous issues deepspeedai/DeepSpeed#3106
and from an earlier discussion at MII, deepspeedai/DeepSpeed-MII#133
I also got the similar message that DeepSpeed not yet supports tensor parallelism for CodeGen
for example "@Emerald01 The reason you are not seeing memory savings is because DeepSpeed-inference does not support automatic kernel injection with Codegen models at this time. Without the DeepSpeed kernels, we do not shard the model across GPUs. If you were to test with a model where we do support automatic injection (e.g., gpt2), you would see the memory per GPU is reduced."
I think DeepSpeed seems not support tensor parallelism for CodeGen, then the ZeRO stage cannot use 3 to split model across the node so far, am I right?