From Noise to Logic: Reasoning-aware Context Reconstruction for Multi-hop QA Submitted to CIKM 2026 (Submission #840)
RACR is a two-stage context compression framework for Retrieval-Augmented Generation (RAG) that addresses two structural mismatches in deep retrieval scenarios:
- Selection Mismatch (Bridge Evidence Problem): Bridge evidence — intermediate facts essential for multi-hop chains but weakly similar to the query — is discarded by traditional relevance-based methods.
- Organization Mismatch (Logical Structure Problem): Retrieved snippets are ordered by retrieval rank, not logical precedence, breaking causal chains.
RACR decouples these concerns into two stages:
- Stage 1 — Reasoning-Aware Atomic Refinement (
[Mode] ANALYSIS) Groups snippets into bounded windows and distils each into concise, utility-validated notes. - Stage 2 — Global Logical Synthesis (
[Mode] REFLECTION) Constructs a coherent deductive chain from the evidence pool and filters off-chain redundancies.
RACR/
+-- racr/ # Core RACR module
| +-- base.py # BaseCompressor interface and SearchResult dataclass
| +-- compressor.py # RACRCompressor (two-stage Analysis + Reflection)
| +-- prompt.py # ANALYSIS_TEMPLATE and REFLECTION_TEMPLATE
| +-- utils.py # Text chunking and JSON parsing helpers
+-- baselines/ # Baseline compressors used in the paper
| +-- compact/ # CompAct
| +-- refiner/ # Refiner
| +-- recomp_abst/ # RECOMP-Abstractive
| +-- recomp_extr/ # RECOMP-Extractive
| +-- longllmlingua/ # LongLLMLingua
+-- evaluate/
| +-- run_eval.py # Main evaluation entry point
| +-- metrics.py # EM / F1 scoring
+-- configs/
| +-- eval_config.py # Dataset paths and model settings
+-- requirements.txt
Note (Anonymous Submission Stage) The fine-tuned model checkpoint and training data are not included in this submission to comply with double-blind review requirements.
Upon acceptance, we will publicly release:
- The fine-tuned Qwen2.5-7B-Instruct RACR checkpoint on Hugging Face Hub.
- The full distillation dataset (~78k samples) synthesized via the DeepSeek-R1 teacher pipeline.
- Step-by-step training scripts and data synthesis pipeline.
pip install -r requirements.txtEach dataset file should be a JSONL where every line follows:
{"question": "...", "answers": ["..."], "ctxs": [{"title": "...", "text": "..."}]}Update paths in configs/eval_config.py.
In configs/eval_config.py, set racr_model_path to the released checkpoint path
(available after acceptance).
cd RACR
python evaluate/run_eval.pyResults (EM, F1, Compression Ratio) are saved to the directory specified in eval_config.py.
from racr import RACRCompressor, SearchResult
compressor = RACRCompressor(
model_path="<path_to_checkpoint>",
use_vllm=True,
vllm_tensor_parallel_size=1,
)
docs = [
SearchResult(evi_id=0, docid=0, title="Doc A", text="..."),
SearchResult(evi_id=1, docid=1, title="Doc B", text="..."),
]
compressed = compressor.compress(query="Who founded SpaceX?", documents=docs)
print(compressed[0].text)| Method | NQ EM | TQA EM | HotpotQA EM | 2Wiki EM | MuSiQue EM | Avg EM | CR |
|---|---|---|---|---|---|---|---|
| Original | 30.1 | 61.1 | 28.5 | 22.8 | 18.4 | 32.2 | 1x |
| LongLLMLingua | 28.2 | 59.8 | 29.5 | 24.6 | 15.8 | 31.6 | 9x |
| CompAct | 33.9 | 59.7 | 30.2 | 20.7 | 16.9 | 32.3 | 35x |
| Refiner | 28.4 | 56.6 | 27.3 | 20.4 | 18.4 | 30.2 | 46x |
| RACR (Ours) | 35.0 | 60.3 | 32.6 | 29.0 | 36.1 | 38.6 | 41x |
Improvements are statistically significant at p < 0.01 on all multi-hop datasets.
This code is released under the MIT License.