Skip to content

Commit 5920dc4

Browse files
author
alcholiclg
committed
refine readme for deep research; add run_benchmark.sh; fix counting character for report qa
1 parent c27347f commit 5920dc4

7 files changed

Lines changed: 580 additions & 61 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,7 @@ The **MS-Agent Skill Module** is **Implementation** of [Anthropic-Agent-Skills](
341341
For more details, please refer to [**MS-Agent Skills**](ms_agent/skill/README.md).
342342
343343
344-
### Agentic Insight
344+
### Agentic Insight (Deep Research)
345345
346346
#### - Lightweight, Efficient, and Extensible Multi-modal Deep Research Framework
347347

README_ZH.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -311,8 +311,7 @@ asyncio.run(main())
311311

312312
---
313313

314-
315-
### Agentic Insight
314+
### Agentic Insight (Deep Research)
316315

317316
#### - 轻量级、高效且可扩展的多模态深度研究框架
318317

Lines changed: 195 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
# Agentic Insight v2
32

43
Agentic Insight v2 provides a more scalable framework for deep research, enabling agents to autonomously explore and execute complex tasks.
@@ -7,10 +6,10 @@ Agentic Insight v2 provides a more scalable framework for deep research, enablin
76

87
Agentic Insight v2 is designed around:
98

10-
- **Extensible main-agent + sub-agent architecture**: a Researcher orchestrates Searcher/Reporter and can be extended with new sub agents and tools.
9+
- **Extensible main-agent + sub-agent architecture**: a Researcher orchestrates Searcher/Reporter and can be extended with new sub-agents and tools.
1110
- **File-system based context management**: flexible, debuggable, and resume-friendly context via structured artifacts on disk.
1211
- **Deep-research optimized toolchain**: dedicated todo, evidence, search, and report tools tuned for iterative research loops.
13-
- **Evidence-bound report generation**: reports are generated from raw evidence with explicit bindings for higher trustworthiness.
12+
- **Evidence-bound report generation**: reports are generated from raw evidence with explicit bindings for higher trustworthiness and traceability.
1413

1514
### 🚀 Quickstart
1615

@@ -28,37 +27,197 @@ pip install -e .
2827
pip install 'ms-agent[research]'
2928
```
3029

31-
#### Environment variables (`.env`)
30+
#### Environment Variables
3231

33-
From repo root:
32+
Create `.env` file in repository root:
3433

3534
```bash
3635
cp projects/deep_research/.env.example .env
3736
```
3837

39-
Edit `.env` and set:
38+
Edit `.env` and set the following **required** environment variables:
39+
40+
```bash
41+
# LLM Configuration (Required)
42+
OPENAI_API_KEY=your_api_key
43+
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1
44+
45+
# Search Engine Configuration (choose one, or use default arxiv with no config needed)
46+
EXA_API_KEY=your_exa_key # Recommended, register at: https://exa.ai
47+
# SERPAPI_API_KEY=your_serpapi_key # Or choose SerpApi, register at: https://serpapi.com
48+
```
49+
50+
#### Model Configuration (⚠️ Required for First Run)
51+
52+
v2 uses three YAML config files to drive the Researcher, Searcher, and Reporter agents. **Before first run, you must modify model names according to your LLM provider**, otherwise you may get model-not-found errors. If you want each agent to use a different model or provider, modify the `llm` section in the corresponding YAML independently; otherwise the defaults from `.env` are used.
53+
54+
##### Models to Configure
55+
56+
For balanced performance and cost, we recommend a **tiered model configuration** — choosing different models for each agent based on its role and requirements.
57+
58+
| YAML File | Config Path | Current Default | Description | Recommendation |
59+
|-----------|-------------|-----------------|-------------|----------------|
60+
| `researcher.yaml` | `llm.model` | `gpt-5-2025-08-07` | Researcher Agent (main agent) | Use a stronger model (e.g. `qwen3-max` / `gpt-5`) for task planning and coordination |
61+
| `searcher.yaml` | `llm.model` | `qwen3.5-plus` | Searcher Agent | Can use same or slightly weaker model (e.g. `qwen3.5-plus` / `MiniMax-M2.5`) |
62+
| `searcher.yaml` | `tools.web_search.summarizer_model` | `qwen3.5-flash` | Web page summarization model (optional) | Use a fast, cheap model (e.g. `qwen3.5-flash` / `gpt-4.1-mini`) |
63+
| `reporter.yaml` | `llm.model` | `qwen3.5-plus` | Reporter Agent | Can use same or slightly weaker model (e.g. `qwen3.5-plus` / `MiniMax-M2.5`) |
64+
| `researcher.yaml` / `reporter.yaml` | `self_reflection.quality_check.model` | `qwen3.5-flash` | Quality check model (optional) | Use a fast, cheap model (e.g. `qwen3.5-flash` / `gpt-4.1-mini`) |
65+
66+
##### Common LLM Provider Examples
67+
68+
Modify model names in YAML files according to your provider:
69+
70+
**Using OpenAI:**
71+
72+
```yaml
73+
# Agent configuration
74+
llm:
75+
service: openai
76+
model: gpt-5-2025-08-07
77+
openai_api_key: <OPENAI_API_KEY>
78+
openai_base_url: <OPENAI_BASE_URL>
79+
80+
# Also modify quality_check and summarizer_model (defaults to OpenAI-compatible provider):
81+
tools:
82+
web_search:
83+
summarizer_model: qwen3.5-flash
84+
summarizer_api_key: <OPENAI_API_KEY>
85+
summarizer_base_url: <OPENAI_BASE_URL>
86+
87+
self_reflection:
88+
quality_check:
89+
enabled: true
90+
model: qwen3-flash
91+
openai_api_key: <OPENAI_API_KEY>
92+
openai_base_url: <OPENAI_BASE_URL>
93+
```
94+
95+
**Other Compatible Endpoints:** Refer to your provider's documentation for model identifiers.
96+
97+
#### Search Engine Configuration
98+
99+
Edit `searcher.yaml` to configure search engines:
100+
101+
```yaml
102+
tools:
103+
web_search:
104+
engines:
105+
- exa # or serpapi (requires corresponding API key in .env)
106+
- arxiv # arxiv requires no API key, always available
107+
api_key: <EXA_API_KEY> # When using EXA
108+
# Or when using SerpApi, add (uncomment):
109+
# serpapi_provider: google # Options: google, bing, baidu
110+
```
111+
112+
**Default:** If no search engine API key is configured, system will use `arxiv` (academic literature search only).
113+
114+
#### Advanced Configuration (Optional)
115+
116+
##### Web Page Summarization
117+
118+
Enabled by default to compress long web content, reducing context bloat, speeding up research, and saving cost:
119+
120+
```yaml
121+
tools:
122+
web_search:
123+
enable_summarization: true
124+
summarizer_model: qwen3.5-flash # Can switch to a cheaper model
125+
max_content_chars: 200000 # Max content chars allowed for summarization; content beyond this is truncated
126+
summarizer_max_workers: 15
127+
summarization_timeout: 360
128+
```
129+
130+
**Note:** Summarization makes additional LLM calls consuming more tokens, but significantly reduces the Searcher Agent's context length.
131+
132+
##### Quality Check
40133

41-
- `OPENAI_API_KEY` (key of OpenAI-compatible endpoint)
42-
- `OPENAI_BASE_URL` (OpenAI-compatible endpoint)
43-
- One of:
44-
- `EXA_API_KEY` (recommended, register at [Exa](https://exa.ai), free quota available)
45-
- `SERPAPI_API_KEY` (register at [SerpApi](https://serpapi.com), free quota available)
134+
Both Researcher and Reporter have quality check mechanisms for verifying report generation quality:
46135

47-
Notes:
136+
```yaml
137+
self_reflection:
138+
enabled: true
139+
max_retries: 2 # Max check rounds
140+
quality_check:
141+
enabled: true
142+
model: qwen3.5-flash
143+
```
48144

49-
- v2 configs use placeholders like `<OPENAI_API_KEY>` / `<EXA_API_KEY>`, which are replaced from environment variables at runtime.
50-
- Do not hardcode keys in scripts; keep them in `.env` (and never commit `.env`).
145+
##### Prefix Cache (Prompt Caching)
51146

52-
#### Run (Researcher entry)
147+
Explicitly triggers cache creation and hits to improve speed and reduce cost (only supported by some providers and models):
148+
149+
```yaml
150+
generation_config:
151+
force_prefix_cache: true # Auto-detects provider support
152+
prefix_cache_roles: [system, user, assistant, tool] # Roles to explicitly request caching for
153+
```
154+
155+
**Supported Providers:** DashScope, Anthropic, and some others. If encountering errors, set to `false`.
156+
157+
#### Configuration File Locations
158+
159+
v2's three YAML config files are located at:
160+
161+
- `projects/deep_research/v2/researcher.yaml` - Researcher main agent config
162+
- `projects/deep_research/v2/searcher.yaml` - Searcher search agent config
163+
- `projects/deep_research/v2/reporter.yaml` - Reporter report generation config
164+
165+
**Placeholder Note:** Placeholders like `<OPENAI_API_KEY>` / `<EXA_API_KEY>` in YAMLs are automatically replaced from `.env` environment variables at runtime. **Do not hardcode API keys in YAMLs** to reduce leak risk.
166+
167+
#### Run
168+
169+
##### Command Line
53170

54171
```bash
55172
PYTHONPATH=. python ms_agent/cli/cli.py run \
56173
--config projects/deep_research/v2/researcher.yaml \
57174
--query "Write your research question here" \
58175
--trust_remote_code true \
59-
--output_dir "output/deep_research/runs"
176+
--output_dir "output/deep_research/runs" \
177+
--load_cache true # Load cache from previous run to resume
60178
```
61179

180+
##### Benchmark Script
181+
182+
We provide `run_benchmark.sh` to run a single demo query or reproduce the full benchmark suite.
183+
**All commands below must be run from the repository root directory.**
184+
185+
**Mode 1 — Single demo query** (no extra setup required):
186+
187+
```bash
188+
bash projects/deep_research/v2/run_benchmark.sh
189+
```
190+
191+
When `DR_BENCH_ROOT` is **not** set, the script runs a single built-in demo query and saves results to `output/deep_research/benchmark_run/`.
192+
193+
**Mode 2 — Full benchmark suite** (requires the benchmark dataset):
194+
195+
```bash
196+
DR_BENCH_ROOT=/path/to/deep_research_bench bash projects/deep_research/v2/run_benchmark.sh
197+
```
198+
199+
When `DR_BENCH_ROOT` is set, the script reads all queries from `$DR_BENCH_ROOT/data/prompt_data/query.jsonl` and runs them in parallel via `dr_bench_runner.py`. You can override additional parameters:
200+
201+
```bash
202+
DR_BENCH_ROOT=/path/to/deep_research_bench \
203+
WORKERS=3 \
204+
LIMIT=5 \
205+
MODEL_NAME=my_experiment \
206+
WORK_ROOT=temp/benchmark_runs \
207+
OUTPUT_JSONL=/path/to/ms_deepresearch_v2_benchmark.jsonl \
208+
bash projects/deep_research/v2/run_benchmark.sh
209+
```
210+
211+
| Parameter | Default | Description |
212+
|-----------|---------|-------------|
213+
| `WORKERS` | `2` | Number of parallel workers |
214+
| `LIMIT` | `0` | Max queries to run (`0` = all) |
215+
| `MODEL_NAME` | `ms_deepresearch_v2_benchmark` | Experiment name for output file |
216+
| `WORK_ROOT` | `temp/benchmark_runs` | Working directory for intermediate results |
217+
| `OUTPUT_JSONL` | `$DR_BENCH_ROOT/data/test_data/raw_data/<MODEL_NAME>.jsonl` | Output JSONL path |
218+
219+
**Note:** The script automatically reads API keys from `.env` in the repository root. Ensure environment variables are properly configured before running.
220+
62221
#### Run in WebUI
63222

64223
You can also use Agentic Insight v2 from the built-in WebUI:
@@ -74,22 +233,30 @@ Then open `http://localhost:7860`, select **Deep Research**, and make sure you h
74233

75234
You can set them via `.env` or in WebUI **Settings**. WebUI run artifacts are stored under `webui/work_dir/<session_id>/`.
76235

77-
### Key configs (what to edit)
78-
79-
- `projects/deep_research/v2/researcher.yaml`
80-
- Researcher orchestration prompt and workflow-level settings.
81-
- `projects/deep_research/v2/searcher.yaml`
82-
- Search engines (exa/arxiv/serpapi), fetching/summarization, evidence store settings.
83-
- `projects/deep_research/v2/reporter.yaml`
84-
- Report generation workflow and report artifacts directory.
85-
86-
### Outputs (where to find results)
236+
### Outputs (Where to Find Results)
87237

88238
Given `--output_dir output/deep_research/runs`:
89239

90240
- **Final report (user-facing)**: `output/deep_research/runs/final_report.md`
91-
- **Todo list**: `output/deep_research/runs/plan.json(.md)`
241+
- **Plan list**: `output/deep_research/runs/plan.json(.md)`
92242
- **Evidence store**: `output/deep_research/runs/evidence/`
93-
- `index.json` and `notes/` are used by Reporter to cite sources.
243+
- `index.json` and `notes/` are used by Reporter to generate the report.
94244
- **Reporter artifacts**: `output/deep_research/runs/reports/`
95245
- Outline, chapters, draft, and the assembled report artifact.
246+
247+
### ❓ Troubleshooting
248+
249+
| Error Type | Possible Cause | Solution |
250+
|-----------|---------------|----------|
251+
| `Model not found` / `Invalid model` | Model name in YAML doesn't match API endpoint | Check and modify `llm.model`, `summarizer_model`, and `quality_check.model` in the three YAMLs to match your provider |
252+
| `Invalid API key` / `Unauthorized` | API key in `.env` is incorrect or expired | Verify `OPENAI_API_KEY` in `.env` is correct, or regenerate API key |
253+
| `Search engine error` / `EXA_API_KEY not found` | Search engine API key not configured | Add `EXA_API_KEY` or `SERPAPI_API_KEY` to `.env`, or modify `searcher.yaml` to use only `arxiv` |
254+
| 400 error / `Invalid request body` | Some generation parameters incompatible | Remove unsupported fields from `generation_config` in the YAML |
255+
| `Timeout` / Timeout errors | Network issues or request too long | Check network connection, or increase `tool_call_timeout` value in the YAML |
256+
| Output too short or incomplete | Model generation parameters limiting | Add or increase `max_tokens` value in `generation_config` in the YAML |
257+
| Stuck mid-execution | Sub-agent in infinite loop or waiting | Check log files in `output_dir` to see which agent is stuck; may need to adjust `max_chat_round` |
258+
| `.env` file not found | `.env` in wrong location | Ensure `.env` is in **repository root**, not in `projects/deep_research/` or `v2/` directories |
259+
260+
#### Getting Help
261+
262+
- Report issues: [GitHub Issues](https://github.com/modelscope/ms-agent/issues)

0 commit comments

Comments
 (0)