Skip to content

[Question] What is MODEL1? #155

@gary-wjc

Description

@gary-wjc

I found the term "MODEL1" is mentioned several times in this repository:

// MODEL1: RoPE is the last 64 dims within the full 512 dim, which couples with the last 64 dim from the NoPE part when performing dual GEMM. i.e.

// The following fields are only valid for MODEL1

I have some questions about this novel term. @interestingLSY Could you please take a look?

  1. Are there some publicly available models belonging to the category of "MODEL1"?

  2. What are the major differences of attention mechanisms between "MODEL1" and V3.2 ?

Update: My preliminary guess is the following for the 2nd question:

  1. The topk length becomes dynamic in MODEL1 while constant(2048) in V3.2
  2. MODEL1 introduces additional KV states which are not generated from history tokens in the current sequence
  3. The number of features in KV cache without position encoding for MODEL1 is 448, which is reduced from 512 as in V3.2
  4. Position encoding is also included when recovering V cache

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions