Skip to content

Question about Flash_atten #86

@ayane11

Description

@ayane11

I don’t have an Ampere architecture GPU, so I cannot use the FlashAttention module and have disabled it in my setup. I would like to ask:

1.Can I directly use the provided one-stage pretrained weights with FlashAttention disabled?

  1. Or do I need to retrain the model from scratch without FlashAttention? Can I just use the default code of Sparsedrive to train the stage 1 model?

Thanks for your guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions