Skip to content

Tensor Parallelism with the current qk norm. #877

@wimh966

Description

@wimh966

❓ The question

Suppose that I need to do the mid-train over the 7B model, how can we enable the tensor parallelism with the current qk norm? Because currently it calculates the avg/std over all hidden dimensions.
Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/questionAn issue that's a question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions