Skip to content

Adaptive Token Pruning explain #2

@chnk58hoang

Description

@chnk58hoang

Hi, Can you explain how your ATP dynamically scale transformer'width i.e number of tokens as you stated in your paper? From your source code in core/model/transformer, I think it can only scale number of transformer layers while numberr of tokens in each sequence still the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions