Skip to content

Commit b4616ad

Browse files
committed
refactor
1 parent 1e7d67c commit b4616ad

18 files changed

Lines changed: 1722 additions & 227 deletions

README.md

Lines changed: 137 additions & 175 deletions
Original file line numberDiff line numberDiff line change
@@ -1,239 +1,201 @@
1+
# WeNet Hotword
12

2-
3-
# WeNet Hotword
4-
5-
**Hotword-biased decoding for the [WeNet](https://github.com/wenet-e2e/wenet) C++ runtime.**
6-
7-
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
3+
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
84
[![C++17](https://img.shields.io/badge/C%2B%2B-17-00599C.svg)](https://en.cppreference.com/w/cpp/17)
9-
[![LibTorch 2.2.0](https://img.shields.io/badge/LibTorch-2.2.0-EE4C2C.svg)](https://pytorch.org/cppdocs/)
10-
11-
[**Eval Writeup**](runtime/libtorch/eval_runs/HOTWORD_EVAL.md)
125

6+
**Hotword-biased decoding for the [WeNet](https://github.com/wenet-e2e/wenet) C++ runtime.**
137

8+
> **Based On**:
9+
> **Model**: `wenet/u2pp_conformer-asr-cn-16k-online`
10+
> **Tune**: `AISHELL-1 hotword test` (235 utts, 187 hotwords)
1411
15-
**tune set** (235 utts): recall ↑ 5.6×    CER ↓ 55%
16-
<br>
17-
**test set** (115 utts): recall ↑ 3.5× &nbsp;&nbsp; CER ↓ 47%
18-
19-
| | baseline (tune) | baseline (test) | ours (tune) | ours (test) |
20-
|--|:--:|:--:|:--:|:--:|
21-
| hotword recall | 15.96% | 25.93% | **90.07%** | **91.11%** |
22-
| CER | 14.20% | 13.76% | **6.32%** | **7.33%** |
23-
24-
<sub>Model: `wenet/u2pp_conformer-asr-cn-16k-online`</sub>
25-
<br>
26-
<sub>Tune: `AISHELL-1 hotword test` &nbsp;&nbsp;</sub>
27-
<br>
28-
<sub>Test: `aishell1_indep_hotword`</sub>
12+
| | Baseline | Ours (Ultra) |
13+
|--|--|--|
14+
| **CER** | 5.14% | **4.82%** |
15+
| **Recall** | 81.08% | **95.95%** |
16+
| **Precision** | **95.24%** | 93.01% |
17+
| **F1** | 87.59% | **94.46%** |
2918

19+
**Test**: `AISHELL-2 iOS eval` (1000 utts, 301 hotwords)
3020

31-
## 🌟 Features
21+
| | Baseline | Ours (Ultra) |
22+
|--|--|--|
23+
| **CER** | 5.14% | **4.83%** |
24+
| **Recall** | 42.03% | **88.41%** |
25+
| **Precision** | **100.00%** | 92.42% |
26+
| **F1** | 59.18% | **90.37%** |
3227

33-
- **Phoneme Corrector** — fuzzy hotword matching via G2P phoneme edit-distance on the n-best.
34-
- **Confidence-Weighted Match Bonus** — per-hotword reward scaled by acoustic confidence.
35-
- **LRU Hotword Cache** — recurring hotwords get a lowered fuzzy threshold in streaming.
36-
- **Multi-Objective Autotuner** — Optuna TPE over decoder + hotword knobs, optimizing recall and CER jointly with early-exit stagnation detection.
28+
**Test**: `AISHELL-2 iOS eval` (1000 utts, 27 hard hotwords)
3729

38-
---
30+
## Highlights
3931

40-
## 🚀 Quick Start
32+
* **Phoneme Corrector** — fuzzy hotword matching via G2P phoneme edit-distance on the n-best
33+
* **Confidence-Weighted Match Bonus** — per-hotword reward scaled by acoustic confidence
34+
* **Multi-Objective Autotuner** — 2D/3D Pareto over decoder + hotword knobs, with early-exit stagnation detection
4135

42-
### 1. Install Python deps
36+
## Install
4337

4438
```bash
45-
cd /path/to/wenet-main
46-
47-
# Create and activate virtual environment
39+
# Python environment
4840
uv venv .venv --python 3.12
4941
source .venv/bin/activate
42+
uv pip install torch torchaudio pyyaml dacite optuna soundfile pypinyin jieba modelscope
5043

51-
# Install PyTorch (adjust CUDA version as needed)
52-
uv pip install torch torchaudio \
53-
--index-url https://download.pytorch.org/whl/cu121 \
54-
--extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple
55-
56-
# Install remaining dependencies
57-
uv pip install pyyaml dacite optuna soundfile pypinyin \
58-
-i https://pypi.tuna.tsinghua.edu.cn/simple
44+
# C++ runtime (requires cmake >= 3.14)
45+
cd runtime/libtorch
46+
cmake -B build -DGRAPH_TOOLS=ON -DTORCH=ON
47+
cmake --build build -j --target decoder_main
48+
cd ../..
5949
```
6050

61-
### 2. Download model + test set
51+
## Quick Start
52+
53+
### 1. Download Model
6254

6355
```bash
6456
modelscope download --model wenet/u2pp_conformer-asr-cn-16k-online \
6557
--local_dir ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online
66-
bash tools/prepare_aishell_hotwords.sh ~/userspace/wenet/aishell_test
6758
```
6859

69-
> **Other models (optional)**
70-
>
71-
> Verified models and download commands:
72-
>
73-
> | Model | ModelScope ID |
74-
> |------|--------------|
75-
> | `u2pp_conformer-asr-cn-16k-online` (default) | `wenet/u2pp_conformer-asr-cn-16k-online` |
76-
> | `multi_cn` | `wenet/multi_cn` |
77-
>
78-
> After switching models, re-run Step 5 (confusion matrix) and Step 6 (autotune).
60+
### 2. Download Datasets
7961

80-
### 3. Build decoder_main
62+
preparation (downloads AISHELL-1 + AISHELL-2, builds hotword lists):
8163

8264
```bash
83-
cd runtime/libtorch
84-
cmake -B build -DGRAPH_TOOLS=ON -DTORCH=ON
85-
cmake --build build -j --target decoder_main
86-
cd ../..
65+
bash tools/prepare_benchmark.sh ~/userspace/wenet
8766
```
8867

89-
### 4. Smoke test
68+
### 3. Learn Confusion Matrix (per-model, one-time)
9069

9170
```bash
92-
head -1 ~/userspace/wenet/aishell_test/wav.scp > /tmp/one.scp
93-
runtime/libtorch/build/bin/decoder_main \
94-
--model_path ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online/final.zip \
95-
--unit_path ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online/units.txt \
96-
--wav_scp /tmp/one.scp \
97-
--hotword_path ~/userspace/wenet/aishell_test/hotwords.txt \
98-
--pinyin_dict_path runtime/libtorch/build/bin/dict \
99-
--result /dev/stdout
71+
python3 tools/learn_confusion.py \
72+
--model_dir ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online \
73+
--wav_scp ~/userspace/wenet/aishell_test/wav.scp \
74+
--text ~/userspace/wenet/aishell_test/text \
75+
--out_csv runtime/libtorch/configs/confusion.csv \
76+
--device cpu
10077
```
10178

102-
### 5. Prepare confusion matrix
79+
### 4. Autotune — Four Modes
10380

104-
The confusion matrix is learned from **this model's** CTC posteriors and is not portable across models.
81+
| Mode | Config | Hotwords | Objective | When to use |
82+
|------|--------|----------|-----------|-------------|
83+
| **Aggressive** | `mode_aggressive.yaml` | 187 original | recall↑ + CER↓ | Hotword-dense domains |
84+
| **Balanced** | `mode_balanced.yaml` | 187 original | F1↑ + CER↓ | General voice assistant, balanced R/P |
85+
| **Conservative** | `mode_conservative.yaml` | 349 (+distractors) | F1↑ + CER↓ | Open-domain dialogue, precision matters |
86+
| **Ultra** | `mode_ultra.yaml` | 349 (+distractors) | F1↑ + CER↓ + Precision↑ | Financial/legal — false positive cost is high |
10587

106-
For the example model, run on a development set (e.g. WeNetSpeech dev):
107-
```bash
108-
python3 tools/learn_confusion.py \
109-
--model_dir ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online \
110-
--wav_scp ~/userspace/wenet/wenetspeech_calibration/dev/wav.scp \
111-
--text ~/userspace/wenet/wenetspeech_calibration/dev/text \
112-
--out_csv runtime/libtorch/configs/confusion.csv \
113-
--device cpu
114-
```
88+
> **No free lunch**: Aggressive maximizes recall at the cost of precision (64% on 301-hotword test). Ultra trades ~3% recall for +29 precision points. Choose based on your domain's tolerance for false positives.
11589
116-
### 6. Autotune
90+
Run one (or all) modes:
11791

11892
```bash
93+
# Aggressive
11994
python3 tools/autotune.py \
120-
--config runtime/libtorch/configs/default.yaml \
95+
--config runtime/libtorch/configs/mode_aggressive.yaml \
12196
--search-space runtime/libtorch/configs/search_space.yaml
122-
```
12397

124-
Autotune writes the best configuration to `runtime/libtorch/configs/default.tuned.yaml`.
98+
# Balanced
99+
python3 tools/autotune.py \
100+
--config runtime/libtorch/configs/mode_balanced.yaml \
101+
--search-space runtime/libtorch/configs/search_space.yaml
125102

126-
### 7. Evaluate on held-out
103+
# Conservative
104+
python3 tools/autotune.py \
105+
--config runtime/libtorch/configs/mode_conservative.yaml \
106+
--search-space runtime/libtorch/configs/search_space.yaml
127107

128-
Evaluate the tuned configuration on the **held-out test**
129-
```bash
130-
TUNED_YAML=runtime/libtorch/configs/default.tuned.yaml \
131-
TESTSET=~/userspace/wenet/aishell1_indep_hotword \
132-
bash runtime/libtorch/eval_runs/run_ablations.sh
133-
column -ts $'\t' runtime/libtorch/eval_runs/summary.tsv
108+
# Ultra (3-objective Pareto)
109+
python3 tools/autotune.py \
110+
--config runtime/libtorch/configs/mode_ultra.yaml \
111+
--search-space runtime/libtorch/configs/search_space.yaml
134112
```
135113

136-
`run_ablations.sh` automatically loads the tuned config for the **F_autotune** condition.
137-
138-
## ⚙️ Configuration
139-
140-
Edit `runtime/libtorch/configs/default.yaml`
141-
142-
```yaml
143-
paths:
144-
model_dir: ~/userspace/wenet/models/u2pp_conformer-asr-cn-16k-online
145-
testset_dir: ~/userspace/wenet/aishell_test
146-
eval_testset_dir: ~/userspace/wenet/aishell1_indep_hotword
147-
pinyin_dict_dir: runtime/libtorch/build/bin/dict
148-
149-
decode:
150-
chunk_size: -1
151-
ctc_weight: 0.5
152-
rescoring_weight: 1.0
153-
reverse_weight: 0.0
154-
nbest: 10
155-
156-
hotword:
157-
hotword_path: hotwords.txt
158-
fuzzy_threshold: 0.5
159-
max_append_path: 20
160-
use_confidence_reward: true
161-
enable_hotword_cache: true
162-
confusion_matrix_path: runtime/libtorch/configs/confusion.csv
163-
bonus_weight: 2.0
164-
confidence_floor: 0.4
165-
neighbor_threshold: 0.5
166-
fuzzy_reject_ratio: 0.8
167-
confidence_weight_min: 0.2
168-
bonus_length_scale: 0.5
169-
170-
autotune:
171-
n_trials: 100
172-
sampler: tpe
173-
cer_baseline: 14.20
114+
### 5. Copy Hotword Lists
115+
116+
Hotword lists are shipped in `runtime/libtorch/configs/`. Copy them to your test set directory before evaluation:
117+
118+
```bash
119+
cp runtime/libtorch/configs/hotwords_all.txt \
120+
~/userspace/wenet/aishell2_eval/test1000/
121+
cp runtime/libtorch/configs/hotwords_hard.txt \
122+
~/userspace/wenet/aishell2_eval/test1000/
174123
```
175124

176-
Search space: `runtime/libtorch/configs/search_space.yaml`.
125+
### 6. Evaluate on Held-Out
177126

178-
---
127+
```bash
128+
# Evaluate on 301-hotword list (mixed easy + hard)
129+
python3 tools/evaluate_modes.py \
130+
--test-dir ~/userspace/wenet/aishell2_eval/test1000 \
131+
--hotwords hotwords_all.txt
132+
133+
# Evaluate on 27-hard hotword subset (baseline recall < 90%)
134+
python3 tools/evaluate_modes.py \
135+
--test-dir ~/userspace/wenet/aishell2_eval/test1000 \
136+
--hotwords hotwords_hard.txt
137+
```
179138

180-
## 📊 Results
139+
## Results
181140

182-
`u2pp_conformer-asr-cn-16k-online` on AISHELL hotword test (235 utts, 187 hotwords).
141+
**Model**: `wenet/u2pp_conformer-asr-cn-16k-online`
142+
**Tune**: AISHELL-1 hotword test
143+
**Test**: AISHELL-2 iOS eval subset
183144

184-
| Condition | What it is | CER% | recall% | precision% | F1% |
185-
|-----------|-----------|------:|--------:|-----------:|----:|
186-
| A_baseline | Plain CTC + attention rescoring, no hotword | 14.20 | 15.96 | 97.83 | 27.44 |
187-
| B_phoneme | + phoneme corrector (G2P + fuzzy match) | 12.62 | 32.62 | 98.92 | 49.07 |
188-
| D_confidence | + confidence-weighted match bonus | 12.04 | 36.17 | 99.03 | 52.99 |
189-
| E_cache | + LRU hotword cache | 12.04 | 36.17 | 99.03 | 52.99 |
190-
| F_autotune | E_cache + TPE-autotuned knobs (12 params) | 6.32 | 90.07 | 96.21 | 93.04 |
191-
| G_wenet_native | Upstream WeNet character-FST biasing only | 10.97 | 46.45 | 99.24 | 63.29 |
145+
### 301-Hotword Test (mixed easy + hard)
192146

147+
| Mode | CER% | Recall% | Precision% | F1% |
148+
|------|------:|--------:|-----------:|----:|
149+
| Baseline (no hotword) | 5.14 | 81.08 | 95.24 | 87.59 |
150+
| **Aggressive** | 6.00 | 92.79 | 63.78 | 75.60 |
151+
| **Balanced** | 5.27 | 93.69 | 76.47 | 84.21 |
152+
| **Conservative** | 4.98 | 93.24 | 83.81 | 88.27 |
153+
| **Ultra** | **4.82** | **95.95** | **93.01** | **94.46** |
193154

194-
**Held-out** (`aishell1_indep_hotword`, 115 utts — never seen during tuning):
155+
### 27-Hard Hotword Test (baseline recall < 90%)
195156

196-
| Condition | CER% | recall% | precision% | F1% |
197-
|-----------|------:|--------:|-----------:|----:|
198-
| D_confidence | 11.88 | 48.15 | 98.48 | 64.68 |
199-
| F_autotune | 7.33 | 91.11 | 98.40 | 94.62 |
200-
| G_wenet_native | 10.49 | 59.26 | 98.77 | 74.07 |
157+
| Mode | CER% | Recall% | Precision% | F1% |
158+
|------|------:|--------:|-----------:|----:|
159+
| Baseline | 5.14 | 42.03 | 100.00 | 59.18 |
160+
| **Aggressive** | 5.10 | 98.55 | 64.76 | 78.16 |
161+
| **Balanced** | 4.92 | 98.55 | 73.91 | 84.47 |
162+
| **Conservative** | **4.68** | 94.20 | 86.67 | 90.28 |
163+
| **Ultra** | 4.83 | 88.41 | **92.42** | **90.37** |
201164

202-
Full write-up: [`HOTWORD_EVAL.md`](runtime/libtorch/eval_runs/HOTWORD_EVAL.md)
165+
### Key Findings
203166

204-
---
167+
1. **All hotword-enhanced modes improve or maintain CER** over no-hotword baseline (5.14% → 4.68–6.00%), showing the pipeline does not harm general ASR.
168+
2. **On 27 hard-case hotwords** (foreign names the baseline misses), our method achieves **88% recall** vs baseline's **42%** — the phoneme corrector closes the gap where character-level matching fails.
169+
3. **Ultra mode is the overall best**: highest F1 (94.46% on 301-hot, 90.37% on hard-case) via 3-objective Pareto optimization — no hard-coded precision floor needed.
170+
4. **Conservative mode is the practical sweet spot**: lowest CER on hard-case (4.68%) with strong F1 (90.28%), making it suitable for precision-sensitive domains.
205171

206-
## 📂 Project Structure
172+
## Project Structure
207173

208174
```text
209-
wenet-main/
210-
├── runtime/core/decoder/
211-
│ ├── corrector.{cc,h} # PhonemeCorrector + fuzzy match + confusion matrix
212-
│ ├── hotword_cache.{cc,h} # LRU hotword cache
213-
│ ├── asr_decoder.{cc,h} # CalculateMatchBonus + n-best correction wiring
214-
│ ├── params.h # gflags (bonus_weight, confidence_floor, etc.)
215-
│ └── context_graph.{cc,h} # upstream WeNet character-FST context graph
216-
├── runtime/core/bin/
217-
│ └── decoder_main.cc # decoder binary (+ daemon mode for autotune)
218-
├── runtime/libtorch/configs/
219-
│ ├── default.yaml # base config (includes 12-knob autotune)
220-
│ └── search_space.yaml # Optuna search space
221-
├── runtime/libtorch/eval_runs/
222-
│ ├── run_ablations.sh # A→G ablation runner
223-
│ └── HOTWORD_EVAL.md # full evaluation report
224-
└── tools/ # autotune, metrics, data prep scripts
175+
runtime/core/decoder/
176+
corrector.{cc,h} # PhonemeCorrector + fuzzy match + confusion matrix
177+
hotword_cache.{cc,h} # LRU hotword cache
178+
asr_decoder.{cc,h} # CalculateMatchBonus + n-best correction wiring
179+
params.h # gflags (bonus_weight, confidence_floor, etc.)
180+
runtime/core/bin/
181+
decoder_main.cc # decoder binary (+ daemon mode for autotune)
182+
runtime/libtorch/configs/
183+
mode_{aggressive,balanced,conservative,ultra}.yaml # four mode configs
184+
default.yaml # base config
185+
search_space.yaml # Optuna search space
186+
tools/
187+
autotune.py # multi-objective Pareto tuner
188+
compute-hotword-metrics.py
189+
prepare_hotwords.py # extract 500-hot / filter hard-case
190+
evaluate_modes.py # batch evaluate all 4 tuned configs
225191
```
226192

227-
---
228-
229-
## 🙏 Acknowledgements
230-
231-
- **[WeNet](https://github.com/wenet-e2e/wenet)** — base ASR runtime.
232-
- **[cpp-pinyin](https://github.com/wolfgitpr/cpp-pinyin)** — runtime G2P.
233-
- **[CapsWriter-Offline](https://github.com/HaujetZhao/CapsWriter-Offline)** — inspired the corrector design.
193+
## Acknowledgements
234194

235-
---
195+
* [WeNet](https://github.com/wenet-e2e/wenet) — base ASR runtime
196+
* [cpp-pinyin](https://github.com/wolfgitpr/cpp-pinyin) — runtime G2P
197+
* [CapsWriter-Offline](https://github.com/HaujetZhao/CapsWriter-Offline) — inspired the corrector design
236198

237-
## 📜 License
199+
## License
238200

239-
Apache License 2.0, inherited from upstream WeNet.
201+
Apache License 2.0

0 commit comments

Comments
 (0)