Skip to content

Commit dfdd854

Browse files
authored
Merge pull request #4 from FoundationAgents/fix/readme-captions
docs: add captions for benchmark and ablation tables; fix curriculum …
2 parents c9d2172 + d32502f commit dfdd854

2 files changed

Lines changed: 11 additions & 5 deletions

File tree

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,22 @@
55
Large language models (LLMs) often struggle with **context fidelity**, producing inconsistent or hallucinated answers even when relevant information is present.
66
We propose **CARE**, a native retrieval-augmented reasoning framework that integrates in-context evidence directly into the reasoning chain.
77
This work represents a step toward making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks.
8+
---
9+
810
### Results Overview
911
<p align="center">
1012
<img src="assets/retrieval_results.png" alt="CARE Results" width="85%" style="display:inline-block;"/>
1113
</p>
14+
<p align="center"><em>Figure 1: Comparison of model performance across different settings. CARE demonstrates improved results over baselines on multiple QA benchmarks.</em></p>
1215

1316
### Method Overview
1417
<p align="center">
1518
<img src="assets/method.png" width="80%">
1619
</p>
1720

18-
---
21+
<p align="center"><em>Figure 2: A schematic illustration of the training data and process. The upper part shows SFT data generation (fact injection and special tokens), while the lower part shows the SFT training process together with reinforcement learning (RL) using multiple rewards.</em></p>
1922

23+
---
2024

2125
## 🔧 Installation
2226

@@ -26,7 +30,7 @@ Requirements:
2630
- `transformers>=4.51.0`
2731
- `flash-attn>=2.4.3`
2832
- `vllm>=0.8.3`
29-
33+
---
3034
Clone and install:
3135
```bash
3236
git clone https://github.com/FoundationAgents/CARE
@@ -44,6 +48,7 @@ Use the provided helper script to download Qwen models and datasets (DROP, MuSiQ
4448
```bash
4549
python CARE/scripts/load_script/load_dataset.py
4650
```
51+
4752
```bash
4853
python CARE/scripts/load_script/load_model.py
4954
```
@@ -87,8 +92,8 @@ Edit the script to change:
8792
| **Qwen2.5 14B** | Original | 47.58 | *61.94* | *59.05* | *37.99* | *51.64* |
8893
| | CRAG | **50.89** | 44.74 | 34.68 | 28.17 | 39.62 |
8994
| | **CARE** | *48.81* | **67.75** | **78.68** | **51.27** | **61.63** |
95+
<p align="center"><em>Table 1: Evaluation on real-world QA datasets. Results are grouped by the base LLM. The best and second-best results are shown in <b>bold</b> and <u>underline</u>, respectively. Slash (/) indicates unavailable checkpoints or unsupported models.</em></p>
9096

91-
---
9297

9398
### Ablation Study
9499

@@ -99,6 +104,7 @@ Edit the script to change:
99104
| No Retrieval ||||| 37.66 | 62.59 | _70.57_ | 43.85 | 57.26 | 54.39 |
100105
| No Curriculum||||| 38.33 | **64.10** | **70.69** | **47.49** | _60.60_ | _56.24_ |
101106
| **CARE** ||||| **48.11**| _63.45_ | 70.11 | _45.57_ | **64.56** | **58.36** |
107+
<p align="center"><em>Table 2: Ablation studies on QA tasks based on Qwen2.5-7B. The best and second-best results are shown in <b>bold</b> and <u>underline</u>, respectively. “Ret.” indicates the retrieval reward, and “Cur.” indicates curriculum learning.</em></p>
102108

103109
📌 *Whether to enable curriculum learning can be controlled in*
104-
[`CARE/verl/trainer/config.py`](CARE/verl/trainer/config.py).
110+
[`verl/trainer/config.py`](verl/trainer/config.py).

scripts/training_examples/run_qwen2_5_7b_retrieve_mix_musique.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ export VLLM_USE_V1=0
1616
# Replace the path with either:
1717
# 1. Your local model checkpoint directory
1818
# 2. Or a Hugging Face Hub repo id, e.g. Qwen/Qwen2.5-7B-Instruct
19-
MODEL_PATH=/mnt/home/LLaMA-Factory/saves/Qwen2.5-7B-Instruct-Ret/full/sft
19+
MODEL_PATH=Qwen/Qwen2.5-7B-Instruct
2020

2121
# -------------------------------
2222
# System prompt

0 commit comments

Comments
 (0)