支持多句子候選輸出 by ksqsf · Pull Request #1164 · rime/librime

ksqsf · 2026-04-26T11:38:18Z

變更

爲 poet 增加 MakeSentences，原理是在 word graph 上 dp 時做一個比較寬的 beam search ，最後輸出指定個數的句子。由於這個算法可能性能比原來的稍差一點，所以保留了原本的 MakeSentence。
整句輸出由 max_sentences 和 cutoff_threshold 控制，weight 與上一個 weight 相對差距超過 cutoff_threshold 時即停止。爲了保證不偏離首選 weight 太多，在每個迭代時會將 cutoff_threshold 縮小到上一次的 1-1/max_sentences，這樣在 max_sentences 過後，cutoff_threshold 縮小爲原來的 1/e 倍。
給 translator_commons 增加 max_sentences 參數（默認 1）。和 sentence_cutoff_threshold 參數（默認 0.1）用於微調算法。
script_translator 在 max_sentences 大於 1 時調用 MakeSentences 並輸出

目前沒有支持 table_translator，未來可以增加。

注意：要發揮最佳功效還需要設定 max_homophones 爲大於 1 的值。see #1165

效果

給朙月拼音增加：

translator:
  max_sentences: 10

並開啓八股文模型。

可以得到：

[kai shi ni ding tan pan cheng xu he tiao kuan]|
page: 1  (of size 10)
1. [開始擬定談判程序和條款]
2.  開始你定談判程序和條款 
3.  開是擬定談判程序和條款 
4.  開始你定碳盤程序和條款 
5.  開始你定碳盤成許和條款 
6.  開是你定談判程序和條款 
7.  開始擬定談盤程序和條款 
8.  開始擬定談判成許和條款 
9.  開是你定碳盤程序和條款 
10.  開始你定談盤程序和條款

設爲小於等於 1 的數時，或不設置 max_sentences 時，仍使用原邏輯產生單個句子。

[kai shi ni ding tan pan cheng xu he tiao kuan]|
page: 1  (of size 10)
1. [開始擬定談判程序和條款]
2.  開始 
3.  開市 
4.  開示 
5.  開釋 
6.  揩拭 
7.  開駛 
8.  愒時 
9.  楷式 
10.  開士

默認選項下，可能由於 cutoff 機制，輸出的句子數不足 max_sentences （此處設爲 10，但在下面的場景中分別只輸出了 1 句和 2 句）

[ta hen bu li jie]|
page: 1  (of size 10)
1. [他很不理解]
2.  他 
3.  她 
4.  它 
5.  塔 
6.  牠 
7.  踏 
8.  塌 
9.  嗒 
10.  榻 

[wo ye bu dong]|
page: 1  (of size 10)
1. [我也不懂]
2.  我也不動 
3.  我也不 
4.  我也 
5.  沃野 
6.  我 
7.  喔 
8.  窩 
9.  握 
10.  臥

Copilot

Pull request overview

此 PR 在現有 script translator 的「造句」能力上擴展為可輸出多個句子候選，透過在 word graph 上做較寬的 beam search 產生多個 Sentence，並以 max_sentences 參數控制輸出數量。

Changes:

為 Poet 新增 MakeSentences()：在 word graph 上以 beam search + dp 產生多句子候選。
在 TranslatorOptions 新增 max_sentences 配置讀取，並由 script_translator 依配置選擇單句/多句輸出。
增加公共 deque 依賴以支援多句候選容器。

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/rime/gear/translator_commons.h	新增 `TranslatorOptions::max_sentences_` 選項欄位
src/rime/gear/translator_commons.cc	從 config 讀取 `max_sentences` 並做最小值修正
src/rime/gear/script_translator.cc	`ScriptTranslation` 支援多句候選輸出（改用 `deque` 管理 sentence 候選）
src/rime/gear/poet.h	宣告 `Poet::MakeSentences()` API
src/rime/gear/poet.cc	實作多句輸出的 beam search 造句邏輯與去重
src/rime/common.h	引入 `<deque>` 並在 `rime` 命名空間匯入 `std::deque`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lotem

很好很強大

lotem · 2026-04-27T01:38:29Z

[他很不理解]

這麼厲害，沒有李姐？

ksqsf · 2026-04-27T09:00:24Z

至少朙月拼音和八股文沒有😄

gaboolic · 2026-04-27T14:22:00Z

在此之前我在我的分支上也做了一个top3多整句，根据我的评测结果，似乎有点不一样，我会寻找下差异点

我测试我的版本，朙月拼音的top3正确率高一点， rime_frost top3正确率也高一点，但是rime_frost使用万象的模型没有这个版本的 top3正确率高。

  [朙月拼音用我修改的librime]
句子正确率: 63.82%  (157/246 句完全匹配)
文字正确率: 89.68%  (全语料金文加权，基于 Levenshtein)
文字正确率(逐句平均): 89.78%

  [朙月拼音用这个版本的librime]
句子正确率: 55.28%  (136/246 句完全匹配)
文字正确率: 89.83%  (全语料金文加权，基于 Levenshtein)
文字正确率(逐句平均): 89.87%

ksqsf requested review from Copilot and lotem April 26, 2026 11:38

ksqsf self-assigned this Apr 26, 2026

Copilot started reviewing on behalf of ksqsf April 26, 2026 11:38 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Comment thread src/rime/gear/script_translator.cc Outdated

Comment thread src/rime/gear/translator_commons.cc Outdated

Comment thread src/rime/gear/poet.cc

Comment thread src/rime/gear/poet.cc

Comment thread src/rime/gear/poet.cc

ksqsf requested a review from a team April 26, 2026 11:47

feat(poet): make multiple sentences

77a8feb

ksqsf force-pushed the multiple-sentences branch from 53caf59 to c2dab99 Compare April 26, 2026 12:29

lotem approved these changes Apr 27, 2026

View reviewed changes

ksqsf force-pushed the multiple-sentences branch 2 times, most recently from 0a41d91 to 2f7685c Compare April 27, 2026 08:54

feat(script_translator): allow multiple sentence candidates

0fb3b96

ksqsf force-pushed the multiple-sentences branch from 2f7685c to 0fb3b96 Compare April 27, 2026 09:01

ksqsf merged commit 9422ca7 into master Apr 27, 2026
2 checks passed

ksqsf deleted the multiple-sentences branch April 27, 2026 09:01

gaboolic mentioned this pull request Apr 27, 2026

feat: 修改max_homophones默认值以提升多整句输出时的top3正确率 #1165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

支持多句子候選輸出#1164

支持多句子候選輸出#1164
ksqsf merged 2 commits intomasterfrom
multiple-sentences

ksqsf commented Apr 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lotem left a comment

Uh oh!

lotem commented Apr 27, 2026

Uh oh!

ksqsf commented Apr 27, 2026

Uh oh!

Uh oh!

gaboolic commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ksqsf commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

變更

效果

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lotem left a comment

Choose a reason for hiding this comment

Uh oh!

lotem commented Apr 27, 2026

Uh oh!

ksqsf commented Apr 27, 2026

Uh oh!

Uh oh!

gaboolic commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ksqsf commented Apr 26, 2026 •

edited

Loading

gaboolic commented Apr 27, 2026 •

edited

Loading