Name	Name	Last commit message	Last commit date
parent directory ..
hypermem	hypermem
scripts	scripts
.gitignore	.gitignore
README.md	README.md
requirements.txt	requirements.txt

HyperMem: Hypergraph Memory for Long-Term Conversations

Official implementation of our ACL 2026 paper HyperMem: Hypergraph Memory for Long-Term Conversations.

Long-term memory for conversational agents requires modelling high-order associations, i.e., joint dependencies among multiple related episodes and facts, which pairwise relations in existing RAG and graph-based memory systems cannot capture. HyperMem addresses this by structuring memory as a three-level hypergraph (topics → episodes → facts) connected through weighted hyperedges, and retrieving information via a coarse-to-fine top-down traversal.

On the LoCoMo benchmark, HyperMem reaches 92.73% LLM-as-a-judge accuracy, outperforming the strongest RAG baseline (HyperGraphRAG, 86.49%) by +6.24% and the strongest memory system (MemOS, 75.80%) by +16.93%.

Method

Three-level hypergraph with hyperedges linking nodes of the same level.

Given a dialogue stream $X = {x_t}_{t=1}^T$, HyperMem constructs a memory hypergraph

$$\mathcal{H} = (\mathcal{V}^T \cup \mathcal{V}^E \cup \mathcal{V}^F,; \mathcal{E}^E \cup \mathcal{E}^F),$$

where $\mathcal{V}^T, \mathcal{V}^E, \mathcal{V}^F$ denote topic, episode and fact nodes respectively. Episode hyperedges $\mathcal{E}^E$ connect episode nodes under the same topic with weights $w^E \in [0,1]$; fact hyperedges $\mathcal{E}^F$ connect fact nodes belonging to the same episode with weights $w^F \in [0,1]$.

Level	Node	Semantics
L3	Topic	Long-horizon theme grouping topically related episodes
L2	Episode	Temporally contiguous dialogue segment describing one event
L1	Fact	Atomic queryable knowledge unit extracted from an episode

Hypergraph Construction

Episode Detection: an LLM-driven streaming boundary detector partitions the raw dialogue into semantically complete episodes, each summarised and timestamped.
Topic Aggregation: streaming topic matching against historical topics lazily groups related episodes under shared topics; new topics are created when no sufficient match exists.
Fact Extraction: atomic facts are extracted from each episode (potential queries, keywords, summary), then bound to facts in the same episode via a weighted hyperedge.

Hypergraph Embedding Propagation

Node embeddings are refined by aggregating information from incident hyperedges. A hyperedge embedding is computed as an attention-weighted sum of its member nodes,

$$\mathbf{h}_e = \sum_{v \in V(e)} \alpha_{e,v} \mathbf{h}_v,\quad \alpha_{e,v} = \frac{\exp(w_{e,v})}{\sum_{u \in V(e)} \exp(w_{e,u})},$$

and each node is updated as $\mathbf{h}'_v = \mathbf{h}v + \lambda \cdot \mathrm{Agg}{e \in \mathcal{N}(v)}(\mathbf{h}_e)$ with $\lambda = 0.5$.

Coarse-to-Fine Retrieval

For a query $q$, retrieval proceeds top-down:

Stage 1, Topic Retrieval: BM25 and dense rankings are fused by Reciprocal Rank Fusion,

$$\mathrm{RRF}(d) = \sum_{m=1}^{M} \frac{1}{k + \mathrm{rank}_m(d)},$$

and the top-$k^T$ topics are kept after optional reranking.
Stage 2, Episode Retrieval: episodes in the topic subgraph are scored and the top-$k^E$ are retained.
Stage 3, Fact Retrieval: facts linked to the retained episodes are scored and the top-$k^F$ are used as evidence.

The final answer is generated by an LLM conditioned on the retrieved episodes (with their summaries) and facts.

Installation

HyperMem is tested with Python 3.12 and CUDA 12.1.

git clone https://github.com/<org>/HyperMem.git
cd HyperMem

conda create -n hypermem python=3.12 -y
conda activate hypermem
pip install -r requirements.txt

Environment variables

Create a .env file at the repository root:

# LLM backend (OpenAI-compatible; we use OpenRouter in the paper)
OPENROUTER_API_KEY=sk-...

# Local model endpoints
EMBEDDING_BASE_URL=http://localhost:11810/v1/embeddings
RERANKER_BASE_URL=http://localhost:12810

Local model services

HyperMem uses Qwen3-Embedding-4B for semantic encoding and Qwen3-Reranker-4B for reranking. Both are served via vLLM:

bash scripts/serve_embedding.sh   # GPUs 0-3, port 11810
bash scripts/serve_reranker.sh    # GPUs 4-7, port 12810

Reproducing the paper

The full LoCoMo evaluation pipeline is launched with a single command:

bash scripts/run_eval.sh

The script sequentially runs six stages; all artefacts are written under results/<experiment_name>/.

Stage	Script	Purpose
1	`stage1_memory_extraction.py`	Episode detection from raw dialogues
2	`stage2_hypergraph_extraction.py`	Topic aggregation + fact extraction + hypergraph construction
3	`stage3_hypergraph_index.py`	BM25 and dense indices over the hypergraph
4	`stage4_hypergraph_retrieval.py`	Top-down hierarchical retrieval
5	`stage5_response.py`	LLM answer generation from retrieved evidence
6	`stage6_eval.py`	LLM-as-judge evaluation (3 rounds, averaged)

Individual stages can be run via:

python hypermem/main/eval.py --stages 4 5 6

Configuration

All hyper-parameters live in hypermem/config.py and can be overridden through environment variables:

export HYPERMEM_EXPERIMENT_NAME="HyperMem-v3"
export HYPERMEM_USE_RERANKER=false
export HYPERMEM_INITIAL_CANDIDATES=100       # pre-fusion candidate pool
export HYPERMEM_TOPIC_TOP_K=15               # k^T
export HYPERMEM_EPISODE_TOP_K=25             # k^E
export HYPERMEM_FACT_TOP_K=30                # k^F

This setting uses $\lambda = 0.5$, $(k^T, k^E, k^F) = (15, 25, 30)$, BM25 + dense retrieval with RRF ($k = 60$), and sum aggregation for hyperedge embedding propagation.

Results

LoCoMo benchmark

Accuracy is reported as the LLM-as-judge score (GPT-4o-mini), averaged over 3 evaluation rounds.

Method	Single-hop	Multi-hop	Temporal	Open Domain	Overall
GraphRAG	79.55	54.96	50.16	58.33	67.60
LightRAG	86.68	84.04	60.75	71.88	79.87
HippoRAG 2	86.44	75.89	78.50	66.67	81.62
HyperGraphRAG	90.61	80.85	85.36	70.83	86.49
OpenAI	63.79	42.92	21.71	63.22	52.90
LangMem	62.23	47.92	23.43	72.20	58.10
Zep	61.70	41.35	49.31	76.60	65.99
A-Mem	39.79	18.85	49.91	54.05	48.38
Mem0	67.13	51.15	55.51	72.93	66.88
Mem0$^g$	65.71	47.19	58.13	75.71	68.44
MIRIX	85.11	83.70	88.39	65.62	85.38
Memobase	73.12	64.65	81.20	53.12	72.01
MemU	66.34	63.12	27.10	50.56	56.55
MemOS	81.09	67.49	75.18	55.90	75.80
HyperMem (Ours)	96.08	93.62	89.72	70.83	92.73

Project Structure

HyperMem/
├── hypermem/
│   ├── config.py                 # Experiment configuration
│   ├── types.py                  # Episode / Topic / Fact data classes
│   ├── structure.py              # Hypergraph nodes and hyperedges
│   ├── extractors/               # LLM-driven extraction modules
│   │   ├── episode_extractor.py
│   │   ├── topic_extractor.py
│   │   ├── fact_extractor.py
│   │   └── hypergraph_extractor.py
│   ├── llm/                      # OpenAI-compatible LLM / embedding / reranker clients
│   ├── prompts/                  # Prompt templates (episode / topic / fact / answer)
│   ├── utils/                    # Utility functions
│   └── main/                     # Six-stage pipeline entry points
├── scripts/
│   ├── run_eval.sh               # End-to-end evaluation driver
│   ├── serve_embedding.sh        # Qwen3-Embedding-4B server
│   └── serve_reranker.sh         # Qwen3-Reranker-4B server
├── data/                         # LoCoMo-10 and auxiliary benchmarks
├── results/                      # Per-experiment artefacts
├── requirements.txt
└── README.md

Each experiment directory under results/ contains the extracted episodes/, topics/, facts/, the built hypergraphs/, bm25_index/, vectors/, along with search_results.json, retrieval_logs.json, responses.json, and the final judged.json.

Citation

If HyperMem is useful in your research, please cite our paper:

@inproceedings{yue2026hypermem,
  title     = {HyperMem: Hypergraph Memory for Long-Term Conversations},
  author    = {Yue, Juwei and Hu, Chuanrui and Sheng, Jiawei and Zhou, Zuyi and Zhang, Wenyuan and Liu, Tingwen and Guo, Li and Deng, Yafeng},
  booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year      = {2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

HyperMem: Hypergraph Memory for Long-Term Conversations

Method

Hypergraph Construction

Hypergraph Embedding Propagation

Coarse-to-Fine Retrieval

Installation

Environment variables

Local model services

Reproducing the paper

Configuration

Results

LoCoMo benchmark

Project Structure

Citation

FilesExpand file tree

HyperMem

Directory actions

More options

Directory actions

More options

Latest commit

History

HyperMem

Folders and files

parent directory

README.md

HyperMem: Hypergraph Memory for Long-Term Conversations

Method

Hypergraph Construction

Hypergraph Embedding Propagation

Coarse-to-Fine Retrieval

Installation

Environment variables

Local model services

Reproducing the paper

Configuration

Results

LoCoMo benchmark

Project Structure

Citation