LLM-Knowledge-Representation

This study investigates how Large Language Models (LLMs) represent and recall Interwoven Structured Knowledge across transformer layers.

Project Structure

The project consists of three main parts, along with preliminary setup and data preparation in the Pre folder.

Pre (Preliminary Work)

The Pre folder includes essential preparation steps:

Dataset generation
Activation extraction
Attention map analysis
t-SNE visualization of activation distributions

1. Intermediate Layers Encode Knowledge, Later Layers Shape Language (`Language_vs_factual` folder)

This section explores how intermediate layers store factual knowledge, while later layers refine language outputs.

Linear probing: Training Support Vector Regression (SVR) models for each layer.
Non-matching linear probing
Probability of the target token across layers: Probabilities are calculated by iteratively re-running the model with the next token appended to the prompt.

2. Recall Peaks at Intermediate Layers (`Intervention` folder)

This section investigates whether related attributes are interconnected by analyzing the recall ability of LLMs—their capacity to retrieve attributes related to, but not explicitly mentioned in the prompt. Additionally, we explore the geometric mechanisms behind this recall process by intervention.

3. Relationship in Attribute Representation: From Superposition to Separation (`relationship` folder)

This section examines whether one attribute’s representation can recall related attributes without being mentioned, as well as the relationships between attribute representations across different layers.

Findings

We show that intermediate layers encode factual knowledge by superimposing related attributes in overlapping spaces, enabling effective recall even when attributes are not explicitly prompted. In contrast, later layers refine linguistic patterns and progressively separate attribute representations, optimizing task-specific outputs while narrowing attribute recall.

All study results can be found in the Results folder.

We identify diverse encoding patterns, including the first-time observation of 3D spiral structures when analyzing information related to the periodic table of elements. Our findings reveal a dynamic transition in attribute representations across layers, contributing to mechanistic interpretability and providing insights into how LLMs process complex, interrelated knowledge.

Setup & Installation

Before running the code, set your HF_TOKEN in config.json.
Install the required dependencies:
```
pip install -r requirements.txt
```
Run the respective scripts from each folder as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Intervention		Intervention
Language_vs_factual		Language_vs_factual
Pre		Pre
Relationship		Relationship
Results		Results
activation_datasets/meta-llama-Llama-2-7b-hf		activation_datasets/meta-llama-Llama-2-7b-hf
logits_datasets/meta-llama-Meta-Llama-3.1-70B		logits_datasets/meta-llama-Meta-Llama-3.1-70B
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
periodic_table_dataset.csv		periodic_table_dataset.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Knowledge-Representation

Project Structure

Pre (Preliminary Work)

1. Intermediate Layers Encode Knowledge, Later Layers Shape Language (`Language_vs_factual` folder)

2. Recall Peaks at Intermediate Layers (`Intervention` folder)

3. Relationship in Attribute Representation: From Superposition to Separation (`relationship` folder)

Findings

Setup & Installation

About

Releases

Packages

Languages

License

tldr-group/LLM-knowledge-representation

Folders and files

Latest commit

History

Repository files navigation

LLM-Knowledge-Representation

Project Structure

Pre (Preliminary Work)

1. Intermediate Layers Encode Knowledge, Later Layers Shape Language (Language_vs_factual folder)

2. Recall Peaks at Intermediate Layers (Intervention folder)

3. Relationship in Attribute Representation: From Superposition to Separation (relationship folder)

Findings

Setup & Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Intermediate Layers Encode Knowledge, Later Layers Shape Language (`Language_vs_factual` folder)

2. Recall Peaks at Intermediate Layers (`Intervention` folder)

3. Relationship in Attribute Representation: From Superposition to Separation (`relationship` folder)

Packages