Skip to content

[QUESTION] Setting num-attention-heads=0 for Mamba #1123

@zixianwang2022

Description

@zixianwang2022

Your question

Hi, it seems like I have triggered many assertion errors when trying to train pure Mamba2 without any attention by setting NUM_ATTENTION_HEADS=0.

Can I just give

--hybrid-attention-ratio 0 \
--hybrid-mlp-ratio 0 \

and give NUM_ATTENTION_HEADS a random num to avoid triggering assertions?

I don't see all the errors by doing so.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions