Awesome Activation Engineering

⭐ A curated list of research papers and resources related to Activation Engineering for foundation models, especially the Large Language Models (LLMs).

What is "Activation Engineering"?

Note

Activation engineering (with LLMs) refers to the process of modifying or controlling the internal activation or intermediate output of neurons to analyze or influence model behavior, which is an emerging research feild related to model interpretability, neural network transparency, and controlled generation. It aims to better understand the internal workings of foundation models, particulaly with those high level concepts which are aligned with human cognitions.

How this repo is organized?

Important

Regarding the objectives of activation engineering in analyzing or steering model behavior with arbitrary concepts, this repo delves into the key related areas like concept representation and extraction, concept activation detection, and concept activation steering. The goal is to investigate how concepts are represented within the models, how these concepts can be activated or detected during model inference, and how to steer activation vectors for more targeted control over model behavior.

This repo serves as a resource for researchers and developers interested in the inner workings of neural networks and LLMs, offering methods and experimental findings for advancing the field of activation engineering, which targets to understanding and manipulating model activations toward building more transparent and controllable powerful intelligent.

💭 This repo is ongoing update; If some related papers are missing, please contact me via pull requests :)

🤗 Please also feel free to let me know if there is any mistake or any suggestion for better categorization, Thanks!

License

This project is licensed under the MIT License.

Disclaimer: This repository is for research purposes only. The papers and resources listed here are the property of their respective authors.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Activation Engineering

What is "Activation Engineering"?

How this repo is organized?

Outline of this repo

Categories

Concept Representation and Extraction

Concept Activation Detection

Concept Activation Steering

Collected Related Work (Uncategoried)

Relevant Repo and Blog

License

About

Releases

Packages

Contributors 2

License

ZFancy/awesome-activation-engineering

Folders and files

Latest commit

History

Repository files navigation

Awesome Activation Engineering

What is "Activation Engineering"?

How this repo is organized?

Outline of this repo

Categories

Concept Representation and Extraction

Concept Activation Detection

Concept Activation Steering

Collected Related Work (Uncategoried)

Relevant Repo and Blog

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages