Skip to content

Keep kv cache as list of tensors maybe better than one tensor #1562

Open
@ghost

Description

Describe the bug
If we keep kv cache as list of tensors, there has no need to concatenate kv caches of each decoder blocks (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/gemma/gemma_causal_lm.py#L225). It is helpful for model performance.

Expected behavior
Remove useless concatenation to improve performance.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions