refactor: move highlight content extraction from daemon to client-side by Kakueeen · Pull Request #313 · linuxdeepin/dde-grand-search

Kakueeen · 2026-06-02T03:07:45Z

The highlight content (matched context) for full-text and OCR search
results
was previously extracted synchronously in the daemon during search,
which
blocked the search pipeline. This commit defers highlight extraction
to the
client side using a new HighlightProvider that performs the work
asynchronously in a worker thread.

Changes:

Remove matchedContext parameter from FileSearchUtils::packItem and
extraData, and delete related processing logic.
Disable full-text retrieval in fulltextworker and ocrtextworker by
setting
setFullTextRetrievalEnabled(false). Highlighted content is no longer
fetched during daemon search.
Add HighlightProvider class with request/cancel caching mechanism and
worker thread. Signals highlightReady when content is available.
Add setSearchKeyword methods to ExhibitionWidget, MatchWidget,
GroupWidget
and GrandSearchListView to propagate the current search keyword and
manage
highlight tasks.
In GrandSearchListView, request highlight asynchronously when items
become
visible, using priority for visible items.
Connect HighlightProvider signals in MatchWidget and PreviewWidget to
update highlighted content when it becomes available.
Update GeneralPreviewPlugin to preserve matchedContext in preview
items.
Add dependency on dfm-search library for ContentRetriever.

This improves search responsiveness by moving expensive I/O operations
(reading file content and extracting highlights) out of the critical
search
path.

Influence:

Verify full-text search results still show highlighted content in
list
and preview, but possibly with a slight delay after initial display.
Test OCR search results highlight display in list and preview.
Test rapid keyword changes – ensure old highlight tasks are cancelled
and
new results appear correctly.
Verify that the preview widget updates highlight content when it
arrives
(especially for files not yet visible when highlight was requested).
Test search performance: full-text and OCR searches should feel
snappier
because highlight extraction no longer blocks search completion.
Verify that cached highlights are reused when scrolling back to
previously viewed items.
Test error handling: what happens when highlight fetch fails or
returns
empty content – no crash, no stale pending state.
Ensure the “more” button and view-mode switching (list/grid) still
work
correctly with async highlights.

refactor: 将高亮内容提取从daemon端移至客户端异步加载

全文搜索和OCR搜索结果的高亮片段（matchedContext）原先在daemon搜索阶段同
步提取，
会阻塞搜索流程。此提交将高亮提取延迟到客户端侧，通过新增的
HighlightProvider类
在工作线程中异步完成。

改动：

移除 FileSearchUtils::packItem 和 extraData 的 matchedContext 参
数及相关处理逻辑。
在 fulltextworker 和 ocrtextworker 中通过
setFullTextRetrievalEnabled(false) 禁用全文获取。
高亮内容不再在daemon搜索时提取。
新增 HighlightProvider 类，提供请求/取消/缓存机制和工作线程。通过
highlightReady 信号通知结果。
在 ExhibitionWidget、MatchWidget、GroupWidget、GrandSearchListView 中
增加 setSearchKeyword
方法，传递当前搜索关键词并管理高亮任务。
在 GrandSearchListView 中，当项变为可见时异步请求高亮，对可见区域项给
予高优先级。
在 MatchWidget 和 PreviewWidget 中连接 HighlightProvider 信号，当高亮
内容就绪时更新显示。
更新 GeneralPreviewPlugin 以在预览项中保留 matchedContext。
增加对 dfm-search 库的依赖以使用 ContentRetriever。

通过将耗时的 I/O 操作（读取文件内容并提取高亮）移出关键搜索路径，提升了
搜索响应速度。

Influence:

验证全文搜索结果列表和预览中仍显示高亮内容，但可能在初始显示后稍有
延迟。
测试OCR搜索结果的高亮显示。
测试快速切换关键词：旧高亮任务应被取消，新结果正确显示。
验证预览控件在接收到高亮内容后更新（尤其是当请求时文件尚未可见的
情况）。
测试搜索性能：全文和OCR搜索应感觉更快，因为高亮提取不再阻塞搜索完成。
验证滚动回之前查看的项时，缓存的高亮内容被复用。
测试错误处理：高亮获取失败或返回空内容时，不会崩溃或留下待处理状态。
确保“更多”按钮和视图模式切换（列表/网格）在异步高亮下仍正常工作。

The highlight content (matched context) for full-text and OCR search results was previously extracted synchronously in the daemon during search, which blocked the search pipeline. This commit defers highlight extraction to the client side using a new HighlightProvider that performs the work asynchronously in a worker thread. Changes: - Remove `matchedContext` parameter from `FileSearchUtils::packItem` and `extraData`, and delete related processing logic. - Disable full-text retrieval in fulltextworker and ocrtextworker by setting `setFullTextRetrievalEnabled(false)`. Highlighted content is no longer fetched during daemon search. - Add HighlightProvider class with request/cancel caching mechanism and worker thread. Signals `highlightReady` when content is available. - Add `setSearchKeyword` methods to ExhibitionWidget, MatchWidget, GroupWidget and GrandSearchListView to propagate the current search keyword and manage highlight tasks. - In GrandSearchListView, request highlight asynchronously when items become visible, using priority for visible items. - Connect HighlightProvider signals in MatchWidget and PreviewWidget to update highlighted content when it becomes available. - Update GeneralPreviewPlugin to preserve matchedContext in preview items. - Add dependency on dfm-search library for `ContentRetriever`. This improves search responsiveness by moving expensive I/O operations (reading file content and extracting highlights) out of the critical search path. Influence: 1. Verify full-text search results still show highlighted content in list and preview, but possibly with a slight delay after initial display. 2. Test OCR search results highlight display in list and preview. 3. Test rapid keyword changes – ensure old highlight tasks are cancelled and new results appear correctly. 4. Verify that the preview widget updates highlight content when it arrives (especially for files not yet visible when highlight was requested). 5. Test search performance: full-text and OCR searches should feel snappier because highlight extraction no longer blocks search completion. 6. Verify that cached highlights are reused when scrolling back to previously viewed items. 7. Test error handling: what happens when highlight fetch fails or returns empty content – no crash, no stale pending state. 8. Ensure the “more” button and view-mode switching (list/grid) still work correctly with async highlights. refactor: 将高亮内容提取从daemon端移至客户端异步加载全文搜索和OCR搜索结果的高亮片段（matchedContext）原先在daemon搜索阶段同步提取，会阻塞搜索流程。此提交将高亮提取延迟到客户端侧，通过新增的 HighlightProvider类在工作线程中异步完成。改动： - 移除 `FileSearchUtils::packItem` 和 `extraData` 的 `matchedContext` 参数及相关处理逻辑。 - 在 fulltextworker 和 ocrtextworker 中通过 `setFullTextRetrievalEnabled(false)` 禁用全文获取。高亮内容不再在daemon搜索时提取。 - 新增 HighlightProvider 类，提供请求/取消/缓存机制和工作线程。通过 `highlightReady` 信号通知结果。 - 在 ExhibitionWidget、MatchWidget、GroupWidget、GrandSearchListView 中增加 `setSearchKeyword` 方法，传递当前搜索关键词并管理高亮任务。 - 在 GrandSearchListView 中，当项变为可见时异步请求高亮，对可见区域项给予高优先级。 - 在 MatchWidget 和 PreviewWidget 中连接 HighlightProvider 信号，当高亮内容就绪时更新显示。 - 更新 GeneralPreviewPlugin 以在预览项中保留 matchedContext。 - 增加对 dfm-search 库的依赖以使用 `ContentRetriever`。通过将耗时的 I/O 操作（读取文件内容并提取高亮）移出关键搜索路径，提升了搜索响应速度。 Influence: 1. 验证全文搜索结果列表和预览中仍显示高亮内容，但可能在初始显示后稍有延迟。 2. 测试OCR搜索结果的高亮显示。 3. 测试快速切换关键词：旧高亮任务应被取消，新结果正确显示。 4. 验证预览控件在接收到高亮内容后更新（尤其是当请求时文件尚未可见的情况）。 5. 测试搜索性能：全文和OCR搜索应感觉更快，因为高亮提取不再阻塞搜索完成。 6. 验证滚动回之前查看的项时，缓存的高亮内容被复用。 7. 测试错误处理：高亮获取失败或返回空内容时，不会崩溃或留下待处理状态。 8. 确保“更多”按钮和视图模式切换（列表/网格）在异步高亮下仍正常工作。

sourcery-ai

Sorry @Kakueeen, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

deepin-ci-robot · 2026-06-02T03:09:18Z

deepin pr auto review

这是一次非常出色的架构重构。将全文/OCR搜索的高亮内容从“搜索阶段同步获取”改为“展示阶段异步延迟加载”，能够显著改善搜索结果的首次呈现速度，避免因大量文件 I/O 导致的界面卡顿。同时，引入任务取消机制（cancelTask）也正确处理了用户快速输入时旧任务过期的问题。

以下是对本次代码变更的详细审查，涵盖语法逻辑、代码质量、代码性能和代码安全四个方面：

一、语法与逻辑

HighlightProvider 的双重检查逻辑存在竞态条件
在 processNextRequest 中，双重检查的逻辑有缺陷：

{
    QMutexLocker lk(&m_requestMutex);
    if (m_pendingRequests.isEmpty()) {
        m_processing.store(0, std::memory_order_release);
        // 缺陷：此时锁还被持有，compare_exchange_strong 必定失败，因为只有当前线程能修改 m_pendingRequests
        if (!m_pendingRequests.isEmpty()) {
            int expected = 0;
            if (m_processing.compare_exchange_strong(expected, 1, std::memory_order_acquire)) {
                continue;
            }
        }
        return;
    }
    req = m_pendingRequests.takeFirst();
}

问题：在持有 m_requestMutex 的情况下检查 m_pendingRequests 是否为空，如果不为空则尝试 CAS。但由于锁已被当前线程持有，其他线程不可能在此时向队列添加数据，因此 isEmpty() 的状态不会改变，CAS 永远不会触发。
修复建议：直接去掉双重检查，因为队列的操作受 m_requestMutex 保护，不会漏掉请求：

{
    QMutexLocker lk(&m_requestMutex);
    if (m_pendingRequests.isEmpty()) {
        m_processing.store(0, std::memory_order_release);
        return;
    }
    req = m_pendingRequests.takeFirst();
}

PreviewWidget::updateHighlightContent 信号槽签名不匹配
在 previewwidget.cpp 中：
```
connect(HighlightProvider::instance(), &HighlightProvider::highlightReady,
        this, &PreviewWidget::updateHighlightContent);
```
信号 highlightReady 的签名是 highlightReady(QString, QString, QString) (taskId, filePath, content)。
但槽函数 updateHighlightContent 的签名是 updateHighlightContent(QString, QString) (filePath, content)。
问题：Qt 在编译时进行静态类型检查，信号与槽的参数数量不匹配会导致编译失败或运行时断言失败。
修复建议：使用 Lambda 适配参数：
```
connect(HighlightProvider::instance(), &HighlightProvider::highlightReady,
        this, [this](const QString &taskId, const QString &filePath, const QString &content) {
    Q_UNUSED(taskId)
    updateHighlightContent(filePath, content);
});
```
MatchWidget::initConnect 中的低效逻辑
```
if (groupWidget && groupWidget->findItemByPath(filePath).item == filePath) {
    groupWidget->getListView()->onHighlightReady(keyword, filePath, content);
}
```
问题：findItemByPath 通常需要遍历 Model 数据，这相当于做了一次查找。紧接着 onHighlightReady 内部调用的 updateHighlight 又会遍历一次 Model 来更新数据。这是逻辑上的重复劳动。
修复建议：直接调用 onHighlightReady，让 updateHighlight 在遍历时顺便判断路径是否匹配即可。

二、代码质量

魔法字符串应提取为常量
HighlightProvider 中使用了 "__pending__" 作为去重哨兵：
```
static const QString kPendingToken = QStringLiteral("__pending__");
```
这点做得很好。但建议在 highlightprovider.cpp 的 processNextRequest 中，判断是否为 pending 时，使用 kPendingToken 而不是硬编码字符串比较，虽然目前是直接覆盖写入，但未来如果增加状态判断，统一使用常量更安全。

GrandSearchListView::updateHighlight 中的类型转换冗余

QVariant searchMeta;
searchMeta.setValue(item);
m_model->setData(index, searchMeta, DATA_ROLE);

建议：可以直接使用 QVariant::fromValue，更简洁：

m_model->setData(index, QVariant::fromValue(item), DATA_ROLE);

线程安全与对象生命周期
HighlightProvider 的析构函数中：
```
HighlightProvider::~HighlightProvider() {
    m_workerThread->quit();
    m_workerThread->wait(5000);
}
```
问题：如果在应用退出时，工作线程的 processNextRequest 正在执行 m_fetchCallback（这是一个同步阻塞 I/O），而 m_fetchCallback 内部依赖了已析构的单例或组件，可能会导致崩溃。此外，wait(5000) 超时后如果线程未结束，强制析构会导致程序崩溃。
建议：增加安全的退出标志，并在超时后给出警告：
```
HighlightProvider::~HighlightProvider() {
    cancelTask(""); // 或增加一个标记，让 processNextRequest 提前退出
    m_workerThread->quit();
    if (!m_workerThread->wait(5000)) {
        qCWarning(logGrandSearch) << "HighlightProvider: worker thread did not finish in time";
    }
}
```

三、代码性能

m_cache 的内存无限增长风险
HighlightProvider::m_cache 的类型是 QHash<QString, QHash<QString, QString>>。
问题：随着用户不断搜索不同的关键词，cancelTask 只会清空被取消的关键词缓存。如果用户搜索了上百个不同的词，且没有触发 cancelTask，m_cache 将会持续膨胀，引发内存泄漏。
修复建议：
- 在 requestHighlight 中增加 LRU 淘汰逻辑，限制 m_cache 的最大容量（例如最多保留最近 10 次搜索的缓存）。
- 或者，在搜索彻底结束且不再需要缓存时，提供 clearCache() 接口供外部清理。
updateHighlight 在 Model 层的 O(N) 遍历
```
for (int row = 0; row < m_model->rowCount(); ++row) { ... }
```
问题：每次收到一个文件的高亮内容，都要遍历整个 Model。如果列表中有 1000 项，频繁的高亮回调会导致 O(N^2) 的遍历开销。
修复建议：在 GrandSearchListView 中维护一个 QHash<QString, int> m_pathToRow 的映射表，将文件路径映射到 Model 的 Row，这样可以将查找复杂度从 O(N) 降为 O(1)。
thread_local 的 ContentRetriever 初始化开销
```
thread_local DFMSEARCH::ContentRetriever retriever;
```
优点：避免了频繁构造/析构对象，性能好。
注意：需确认 ContentRetriever 在 thread_local 生命周期下是否线程安全，且其内部状态不会因为上次查询而污染本次查询（即每次 fetchHighlight 是否是幂等的）。

四、代码安全

const_cast 的移除是极大的安全性提升
旧代码中存在非常危险的操作：
```
SearchResult mutableResult = const_cast<SearchResult &>(file);
```
对 const 引用进行强转并修改，是典型的未定义行为（UB），特别是在底层库对返回值做静态缓存或写保护时会导致崩溃。本次重构彻底移除了这一危险操作，安全性大幅提升。
异步回调中的空指针风险
在 MatchWidget::initConnect 中：
```
for (GroupWidget *groupWidget : std::as_const(m_groupWidgetMap)) {
    if (groupWidget && groupWidget->findItemByPath(filePath).item == filePath) {
        groupWidget->getListView()->onHighlightReady(keyword, filePath, content);
    }
}
```
问题：信号是异步的，当信号到达时，groupWidget 可能因为界面关闭/清理已经被 delete。虽然 Qt 的 QueuedConnection 会在发送时拷贝参数，但槽函数执行时访问的 this（即 MatchWidget）或其子控件可能已失效。
建议：确保 MatchWidget 在析构时调用 disconnect(HighlightProvider::instance(), nullptr, this, nullptr); 断开连接，或者使用 QPointer<GroupWidget> 进行弱引用保护。

总结

本次重构的核心思路非常正确，代码结构清晰，职责划分明确。主要需要修正的是双重检查锁的无效逻辑和信号槽参数不匹配的编译错误。性能方面，建议增加 Model 查找的行号索引映射 和 缓存容量的上限控制，以保障在极端搜索场景下的流畅度与内存安全。

deepin-ci-robot · 2026-06-02T05:15:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Johnson-zs, Kakueeen

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

debian/deepin/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Kakueeen · 2026-06-02T05:16:33Z

/forcemerge

deepin-bot · 2026-06-02T05:16:50Z

This pr force merged! (status: blocked)

sourcery-ai Bot reviewed Jun 2, 2026

View reviewed changes

Johnson-zs approved these changes Jun 2, 2026

View reviewed changes

deepin-bot Bot merged commit 8e1141f into linuxdeepin:master Jun 2, 2026
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move highlight content extraction from daemon to client-side#313

refactor: move highlight content extraction from daemon to client-side#313
deepin-bot[bot] merged 1 commit into
linuxdeepin:masterfrom
Kakueeen:master

Kakueeen commented Jun 2, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

deepin-ci-robot commented Jun 2, 2026

Uh oh!

deepin-ci-robot commented Jun 2, 2026

Uh oh!

Kakueeen commented Jun 2, 2026

Uh oh!

deepin-bot Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kakueeen commented Jun 2, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

deepin-ci-robot commented Jun 2, 2026

deepin pr auto review

一、 语法与逻辑

二、 代码质量

三、 代码性能

四、 代码安全

总结

Uh oh!

deepin-ci-robot commented Jun 2, 2026

Uh oh!

Kakueeen commented Jun 2, 2026

Uh oh!

deepin-bot Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

一、语法与逻辑

二、代码质量

三、代码性能

四、代码安全