Conversation
There was a problem hiding this comment.
Pull request overview
此 PR 在現有 script translator 的「造句」能力上擴展為可輸出多個句子候選,透過在 word graph 上做較寬的 beam search 產生多個 Sentence,並以 max_sentences 參數控制輸出數量。
Changes:
- 為
Poet新增MakeSentences():在 word graph 上以 beam search + dp 產生多句子候選。 - 在
TranslatorOptions新增max_sentences配置讀取,並由script_translator依配置選擇單句/多句輸出。 - 增加公共
deque依賴以支援多句候選容器。
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/rime/gear/translator_commons.h | 新增 TranslatorOptions::max_sentences_ 選項欄位 |
| src/rime/gear/translator_commons.cc | 從 config 讀取 max_sentences 並做最小值修正 |
| src/rime/gear/script_translator.cc | ScriptTranslation 支援多句候選輸出(改用 deque 管理 sentence 候選) |
| src/rime/gear/poet.h | 宣告 Poet::MakeSentences() API |
| src/rime/gear/poet.cc | 實作多句輸出的 beam search 造句邏輯與去重 |
| src/rime/common.h | 引入 <deque> 並在 rime 命名空間匯入 std::deque |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
53caf59 to
c2dab99
Compare
Member
這麼厲害,沒有李姐? |
0a41d91 to
2f7685c
Compare
Member
Author
|
至少朙月拼音和八股文沒有😄 |
2f7685c to
0fb3b96
Compare
|
在此之前我在我的分支上也做了一个top3多整句,根据我的评测结果,似乎有点不一样,我会寻找下差异点 我测试我的版本,朙月拼音的top3正确率高一点, rime_frost top3正确率也高一点,但是rime_frost使用万象的模型没有这个版本的 top3正确率高。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
變更
爲 poet 增加
MakeSentences,原理是在 word graph 上 dp 時做一個比較寬的 beam search ,最後輸出指定個數的句子。由於這個算法可能性能比原來的稍差一點,所以保留了原本的 MakeSentence。整句輸出由
max_sentences和cutoff_threshold控制,weight 與上一個 weight 相對差距超過 cutoff_threshold 時即停止。爲了保證不偏離首選 weight 太多,在每個迭代時會將 cutoff_threshold 縮小到上一次的 1-1/max_sentences,這樣在 max_sentences 過後,cutoff_threshold 縮小爲原來的 1/e 倍。給 translator_commons 增加
max_sentences參數(默認 1)。和sentence_cutoff_threshold參數(默認 0.1)用於微調算法。script_translator 在 max_sentences 大於 1 時調用 MakeSentences 並輸出
目前沒有支持 table_translator,未來可以增加。
注意:要發揮最佳功效還需要設定
max_homophones爲大於 1 的值。see #1165效果
給朙月拼音增加:
並開啓八股文模型。
可以得到:
設爲小於等於 1 的數時,或不設置 max_sentences 時,仍使用原邏輯產生單個句子。
默認選項下,可能由於 cutoff 機制,輸出的句子數不足 max_sentences (此處設爲 10,但在下面的場景中分別只輸出了 1 句和 2 句)