Skip to content
This repository was archived by the owner on Oct 16, 2023. It is now read-only.

Conversation

@Caesar1993
Copy link

@Caesar1993 Caesar1993 commented Sep 12, 2023

Added parallel code for chatglm-6B.
Due to the small number of parameters, the inference speed is not as fast as single card loading, but it can be referenced in GLM models with larger parameter quantities for inference.

  1. Split the mixed qkv vectors in chatglm on the huggingface into multiple heads, then take out the qkv of each head, and finally concatenate them into a whole qkv
  2. Write the layer definition of chatglm into init, and rebuild the forward function according to the basic layer in Colossalai

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant