GitHub

Abstract

Data and parameters efficiency has always been a concern for small LLMs, but recently LLM advance has started to face the limits of available data and computational resources even on the largest scale. One of potential issue to adress on the way of improving the performance of LLMs despite these constraints is the noisy attention weights distribution. Recent works has showed the potential solutions, however they require training from scratch and large number of additional parameters. In this work we propose low rank attention denoising adapter, that can be applied to pretrained model on the SFT stage. Our approach shows significant improvent over the unmodified model in terms of cross-entropy on training data.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
code		code
images		images
paper		paper
slides		slides
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abstract

About

Uh oh!

Releases

Packages

Uh oh!

Languages

intsystems/LowRankAttentionDenoising

Folders and files

Latest commit

History

Repository files navigation

Abstract

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages