Skip to content

Perf worker#6861

Open
c121914yu wants to merge 10 commits intolabring:mainfrom
c121914yu:perf-file
Open

Perf worker#6861
c121914yu wants to merge 10 commits intolabring:mainfrom
c121914yu:perf-file

Conversation

@c121914yu
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings April 29, 2026 13:01
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 14.09% 1139 / 8081
🔵 Statements 14.08% 1194 / 8475
🔵 Functions 12.57% 245 / 1948
🔵 Branches 12.07% 536 / 4440
File CoverageNo changed files found.
Generated in workflow #41 for commit 2e235ae by the Vitest Coverage Report Action

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Docs Preview Deployed!

🔗 👀 Click here to visit preview

ghcr.io/labring/fastgpt-docs-pr:2e235ae32841eb1530818a2aff3851e07c1c31ae

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a configurable, long-lived worker pool for file parsing / HTML→Markdown / text chunking to improve performance and stability under concurrency, alongside related env/template/documentation updates.

Changes:

  • Add per-task timeout and per-worker recycle thresholds to the worker pool, and switch key worker entrypoints to an id-based request/response protocol.
  • Add new env knobs for worker pool sizing/timeouts and reorganize .env.template / docker-compose env blocks accordingly.
  • Add Vitest coverage for worker dispatch + a “real spawn” readFile integration test (build-artifact dependent).

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
projects/app/.env.template Adds worker concurrency/timeout envs and reorganizes template sections.
packages/service/worker/utils.ts Enhances WorkerPool with task timeout + maxTasksPerWorker recycling; extends controller props.
packages/service/worker/text2Chunks/index.ts Updates worker thread protocol to include id and explicit success/error messages.
packages/service/worker/readFile/index.ts Updates readFile worker to id protocol and simplifies message handling.
packages/service/worker/htmlStr2Md/index.ts Updates htmlStr2Md worker to id protocol with try/catch response.
packages/service/worker/function.ts Switches to pooled worker controller usage and adds env-configured pool sizing/timeouts.
packages/service/test/worker/readFile/integration.test.ts Adds build-artifact-based real worker spawn integration tests for readFile pool behavior.
packages/service/test/worker/function.test.ts Adds unit tests for worker function dispatch/config wiring and SharedArrayBuffer behavior.
packages/service/env.ts Adds/reshapes env schema, including new worker pool envs and many defaults.
packages/service/common/string/utils.ts Switches htmlToMarkdown to use worker pool controller with env-configured sizing/timeouts.
packages/global/common/system/types/index.ts Minor comment update on tokenWorkers constraints.
document/data/doc-last-modified.json Updates doc timestamps for newly/edited docs.
document/content/self-host/upgrading/4-15/4150.mdx Documents new worker pool env variables and release notes line edits.
deploy/templates/docker-compose.prod.yml Updates SYNC_INDEX format and removes some env entries (now defaulted elsewhere) + template reference comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/service/test/worker/readFile/integration.test.ts Outdated
Comment thread packages/service/env.ts
Comment thread packages/service/env.ts
Comment thread packages/service/worker/utils.ts
Comment thread packages/service/worker/utils.ts
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Build Successful - Preview fastgpt Image for this PR:

ghcr.io/labring/fastgpt-pr:fastgpt_2e235ae32841eb1530818a2aff3851e07c1c31ae

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Build Successful - Preview mcp_server Image for this PR:

ghcr.io/labring/fastgpt-pr:mcp_server_2e235ae32841eb1530818a2aff3851e07c1c31ae

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Build Successful - Preview code-sandbox Image for this PR:

ghcr.io/labring/fastgpt-pr:code-sandbox_2e235ae32841eb1530818a2aff3851e07c1c31ae

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Admin Preview Image Ready!

ghcr.io/labring/fastgpt-pr:admin_2e235ae32841eb1530818a2aff3851e07c1c31ae

Copy link
Copy Markdown
Collaborator Author

@c121914yu c121914yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Perf worker

📋 需求理解

本 PR 主要把文件解析、HTML 转 Markdown、文本切块改为复用 WorkerPool,并补充可配置 worker 数量/超时;同时扩展 packages/service/env.ts 对更多环境变量做 schema 管理,并调整部署模板与文档。

🧪 逻辑验证

我重点验证了以下路径:

  1. 文件解析并发复用 worker:新增 mock 单测覆盖了 readRawContentFromBuffer 的 SharedArrayBuffer 包装和 pool 配置,定向测试通过。
  2. HTML 转 Markdown:调用已从 runWorker 改为 getWorkerController,mock 单测覆盖 null/正常 HTML,定向测试通过。
  3. 真实 readFile worker:新增 integration 测试依赖 projects/app/worker/readFile.js,当前 worktree 未 build,所以该组被 skipped。
  4. 环境变量兼容:发现 env.ts 的 schema/default 改动没有完全同步到仍直接读取 process.env 的运行时代码,存在回归。

⚠️ 问题汇总

🔴 严重问题(2 个,必须修复)

  1. FILE_TOKEN_KEY 不应给公开默认值,否则未配置密钥的实例会用固定 JWT secret 签发/校验文件 token。
  2. SYNC_INDEX 模板改成布尔值后,现有 Mongo 初始化仍只识别字符串 '0',导致用户设置 false 仍会同步索引。

🟡 建议改进(0 个)

🟢 可选优化(0 个)

🚀 审查结论

需修改。因为我是该 PR 作者账号,GitHub 不允许 request changes;这里以 comment review 提交阻塞意见。

Comment thread packages/service/env.ts Outdated
Comment thread projects/app/.env.template
Copy link
Copy Markdown
Collaborator Author

@c121914yu c121914yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Perf worker 复审

上次两条阻塞点本体已经修复:

  • FILE_TOKEN_KEY 已移除公开默认值,改为必填。
  • 主应用 Mongo 索引同步已经改为读取 env.SYNC_INDEXSYNC_INDEX=false 不会再被当成启用。

复审仍发现 2 个需要继续处理的问题:

🔴 projects/marketplace/src/service/mongo/index.ts 为了读 SYNC_INDEX 引入 @fastgpt/service/env,把主应用必填密钥也带进 marketplace。已在行级评论标出。

🔴 packages/service/common/secret/constants.ts 仍是:

export const AES256_SECRET_KEY = process.env.AES256_SECRET_KEY || 'fastgptkey';

AES256_SECRET_KEY 虽然在 env.ts 里变成必填,但实际加解密仍保留公开默认值。任何只走 secret 工具、没有先触发 env.ts 校验的路径,仍可能用固定密钥加解密。建议改为从 env.AES256_SECRET_KEY 读取,让缺配置 fail closed。

验证:

  • corepack pnpm -C packages/service exec vitest run -c vitest.config.ts test/common/s3/token.test.ts test/common/secret/aes256gcm.test.ts test/worker/function.test.ts 通过,38 passed
  • env -u FILE_TOKEN_KEY -u AES256_SECRET_KEY corepack pnpm --filter @fastgpt/marketplace build 被当前分支另一个 TypeScript 错误拦截:reasoningEffortList 里的 'none' 不匹配 ChatCompletionReasoningEffort,未能走到 env 验证。

import type { Model, Schema } from 'mongoose';
import { Mongoose } from 'mongoose';
import { getLogger, LogCategories } from '../logger';
import { env } from '@fastgpt/service/env';
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 这里不应该直接引入 @fastgpt/service/envenv.ts 现在会强校验主应用的 FILE_TOKEN_KEY / AES256_SECRET_KEY,但 marketplace 是独立项目,projects/marketplace/.env.template 只声明了 S3_PREFIXAUTH_TOKENMONGODB_URI,Dockerfile 也没有注入这些 FastGPT app 密钥。结果是 marketplace 仅为了读 SYNC_INDEX 就被迫配置主应用文件/AES 密钥,构建或运行到该模块时会因为无关 env 缺失而失败。建议在 marketplace 本地解析 SYNC_INDEX,或拆出不依赖主应用密钥的共享 bool/env helper。

Copy link
Copy Markdown
Collaborator Author

CI 里 test:admin 现在也被 AES256_SECRET_KEY 卡住了:pro/admin/vitest.config.tstest.env 目前只设置了 FILE_TOKEN_KEY,但 admin 测试会经由 packages/service/common/mongo/index.ts 导入 packages/service/env.ts,所以在 AES256_SECRET_KEY 改成必填后会直接 validation fail。

失败日志关键栈:

Error: Invalid environment variables. Please check: AES256_SECRET_KEY
 ❯ onValidationError ../../packages/service/env.ts:266:11
 ❯ ../../packages/service/env.ts:18:20
 ❯ ../../packages/service/common/mongo/index.ts:11:1

需要把 AES256_SECRET_KEY 同样补到 pro/admin/vitest.config.ts 的测试 env 里,或者避免 admin 测试导入完整 service env。

Copy link
Copy Markdown
Collaborator Author

@c121914yu c121914yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

复审了一遍,前面几处核心问题已经修掉:

  • FILE_TOKEN_KEY / AES256_SECRET_KEY 改成必填后,测试配置已覆盖 service/app/admin/root vitest 配置。
  • AES256_SECRET_KEY 的实际加解密路径已改为走 env.AES256_SECRET_KEY,旧的 fastgptkey fallback 已删除。
  • Marketplace 不再为了 SYNC_INDEX 导入完整 service env。
  • Next production build 阶段通过 NEXT_PHASE=phase-production-build 跳过 env validation,运行期仍会强校验密钥。

还剩 1 个需要修的部署面问题:Helm chart 的默认 Secret 还没有补 AES256_SECRET_KEYdeploy/helm/fastgpt/templates/secret-env.yaml 目前只有 FILE_TOKEN_KEY: "filetoken",但本 PR 里 packages/service/env.ts 已经要求 AES256_SECRET_KEY 必填。用 Helm chart 安装出来的 FastGPT Pod 启动时会直接报 Invalid environment variables. Please check: AES256_SECRET_KEY

建议在 deploy/helm/fastgpt/templates/secret-env.yaml 补上 AES256_SECRET_KEY,最好也和 compose/template 文档保持一致,让 Helm 路径不会被这次配置收紧打断。

验证:

corepack pnpm -C packages/service exec vitest run -c vitest.config.ts test/common/s3/token.test.ts test/common/secret/aes256gcm.test.ts test/worker/function.test.ts
# 3 files / 38 tests passed

NEXT_PHASE=phase-production-build 且无 FILE_TOKEN_KEY/AES256_SECRET_KEY 时 import packages/service/env 通过
普通运行期无 FILE_TOKEN_KEY/AES256_SECRET_KEY 时仍按预期报错

当前 GitHub checks 还有 pending,我没有等全部完成。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants