[RFC] 137 - Promptfoo Integration #9571

arvinxx · 2025-10-05T14:15:39Z

arvinxx
Oct 5, 2025
Maintainer

概述

本 RFC 描述了在 @lobechat/prompts 包中使用 promptfoo 进行 AI 提示词质量保证的规范和最佳实践。

动机

为什么需要提示词测试？

质量保证：确保提示词在各种输入下都能产生期望的输出
回归防护：防止修改导致的意外行为变化
多模型验证：验证提示词在不同 AI 模型间的一致性
持续改进：通过测试反馈迭代优化提示词

为什么选择 Promptfoo？

AI 原生：专为 LLM 应用设计的测试框架
多模型支持：支持 OpenAI、Anthropic、Google 等多个提供商
灵活断言：支持 LLM 评判、正则、JSON 等多种断言类型
TypeScript 集成：可直接导入和测试实际代码
丰富报告：提供 Web UI 和 JSON 输出

技术规范

目录结构

packages/prompts/
├── src/
│   └── chains/               # 提示词实现
│       ├── translate.ts
│       ├── summaryTitle.ts
│       └── ...
├── promptfoo/                # 测试配置
│   ├── translation/
│   │   ├── eval.yaml        # 测试配置文件
│   │   └── prompt.ts        # 提示词包装器
│   ├── summary-title/
│   │   ├── eval.yaml
│   │   └── prompt.ts
│   └── ...
├── package.json
└── promptfooconfig.yaml      # 全局配置

测试配置文件格式

每个提示词测试目录包含：

eval.yaml - 测试配置
prompt.ts - TypeScript 包装器

eval.yaml 示例

description: 测试翻译准确性

# 测试的 AI 模型
providers:
  - openai:chat:gpt-5-mini
  - openai:chat:claude-3-5-haiku-latest
  - openai:chat:gemini-flash-latest

# 提示词实现
prompts:
  - file://promptfoo/translation/prompt.ts

# 测试用例
tests:
  - vars:
      content: "Hello, how are you?"
      from: "en-US"
      to: "zh-CN"
    assert:
      # LLM 评判断言
      - type: llm-rubric
        provider: openai:gpt-5-mini
        value: "翻译准确，符合中文习惯"
      # 内容包含断言
      - type: contains-any
        value: ["你好", "您好"]
      # 内容排除断言
      - type: not-contains
        value: "explanation"

prompt.ts 示例

// 导入实际的提示词实现
import { chainTranslate } from '@lobechat/prompts';

interface PromptVars {
  content: string;
  from: string;
  to: string;
}

export default function generatePrompt({ vars }: { vars: PromptVars }) {
  const { content, to } = vars;

  // 调用实际的链函数
  const result = chainTranslate(content, to);

  // 返回 promptfoo 期望的消息格式
  return result.messages || [];
}

断言类型

1. llm-rubric - LLM 评判

使用 AI 模型评估输出质量：

- type: llm-rubric
  provider: openai:gpt-5-mini  # 评判模型
  value: "翻译应准确且自然，不包含解释性文字"

适用场景：

语义正确性验证
输出质量评估
风格和语气检查

2. contains / contains-any - 包含检查

# 必须包含所有值
- type: contains
  value: "React"

# 包含任一值即可
- type: contains-any
  value: ["React", "JavaScript", "library"]

适用场景：

关键词验证
必需内容检查

3. not-contains - 排除检查

- type: not-contains
  value: "explanation"

适用场景：

确保没有不需要的内容
格式验证

4. javascript - 自定义逻辑

- type: javascript
  value: |
    (output) => {
      return Array.from(output).length >= 1 && Array.from(output).length <= 3;
    }

适用场景：

复杂的验证逻辑
自定义格式检查

工作流程

1. 创建新的提示词测试

# 1. 创建测试目录
mkdir -p promptfoo/my-prompt

# 2. 创建 eval.yaml
cat > promptfoo/my-prompt/eval.yaml << 'EOF'
description: 测试描述

providers:
  - openai:chat:gpt-5-mini
  - openai:chat:claude-3-5-haiku-latest

prompts:
  - file://promptfoo/my-prompt/prompt.ts

tests:
  - vars:
      input: "测试输入"
    assert:
      - type: llm-rubric
        provider: openai:gpt-5-mini
        value: "期望行为描述"
EOF

# 3. 创建 prompt.ts 包装器
cat > promptfoo/my-prompt/prompt.ts << 'EOF'
import { myChain } from '../../src/chains/myChain';

export default function generatePrompt({ vars }) {
  const result = myChain(vars.input);
  return result.messages || [];
}
EOF

2. 运行测试

# 运行单个测试
pnpm promptfoo eval -c promptfoo/my-prompt/eval.yaml

# 运行所有测试
pnpm test:prompts

# 监听模式（开发时使用）
pnpm test:prompts:watch

# CI 模式（生成 JSON 报告）
pnpm test:prompts:ci

3. 查看结果

# Web UI
pnpm promptfoo:view

# 命令行查看失败详情
pnpm promptfoo eval -c promptfoo/my-prompt/eval.yaml 2>&1 | grep -A 20 "FAIL"

4. 迭代优化

基于测试结果优化提示词：

分析失败：查看 llm-rubric 给出的失败原因
更新提示词：修改 src/chains/ 中的实现
重新测试：验证改进效果
迭代：重复直到 100% 通过

最佳实践

测试用例设计

1. 覆盖多种场景

tests:
  # 基本场景
  - vars:
      content: "Hello"
  # 边界情况
  - vars:
      content: ""
  # 技术术语
  - vars:
      content: "API_KEY_12345"
  # 混合语言
  - vars:
      content: "使用 React 开发"

2. 多语言测试

tests:
  # 英语
  - vars:
      content: "Hello, how are you?"
      locale: "en-US"
  # 中文
  - vars:
      content: "你好，你好吗？"
      locale: "zh-CN"
  # 西班牙语
  - vars:
      content: "Hola, ¿cómo estás?"
      locale: "es-ES"

3. 多模型验证

providers:
  - openai:chat:gpt-5-mini       # 快速、便宜
  - openai:chat:claude-3-5-haiku-latest  # 平衡
  - openai:chat:gemini-flash-latest      # 多样性

断言设计

1. 组合使用多种断言

assert:
  # 语义检查
  - type: llm-rubric
    provider: openai:gpt-5-mini
    value: "翻译准确，无额外解释"
  # 关键词检查
  - type: contains-any
    value: ["关键词1", "关键词2"]
  # 格式检查
  - type: not-contains
    value: "unwanted"

2. 指定评判模型

- type: llm-rubric
  provider: openai:gpt-5-mini  # 使用特定模型评判
  value: "评判标准"

性能优化

1. 使用缓存

# 开发时使用缓存（默认）
pnpm promptfoo eval -c promptfoo/my-prompt/eval.yaml

# CI 时禁用缓存
pnpm test:prompts:ci

2. 并发控制

promptfoo 默认并发 5 个请求，可在配置中调整。

3. 成本控制

开发时使用 gpt-5-mini 等便宜模型
生产验证时使用多个模型
监控 token 使用量

CI/CD 集成

GitHub Actions 示例

name: Prompt Tests

on:
  pull_request:
    paths:
      - 'packages/prompts/**'

jobs:
  test-prompts:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: pnpm install

      - name: Run prompt tests
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          cd packages/prompts
          pnpm test:prompts:ci

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: prompt-test-results
          path: packages/prompts/results/

进展

💄 style: add promptfoo to improve prompts quality #9568

arvinxx · 2025-10-05T14:16:59Z

arvinxx
Oct 5, 2025
Maintainer Author

示例效果：

/Users/arvinxx/Library/pnpm/pnpm run test:prompts

> @lobechat/[email protected] test:prompts /Users/arvinxx/CodeProjects/LobeHub/lobe-chat/packages/prompts
> pnpm test:prompts:translate && pnpm test:prompts:summary && pnpm test:prompts:lang && pnpm test:prompts:emoji && pnpm test:prompts:qa


> @lobechat/[email protected] test:prompts:translate /Users/arvinxx/CodeProjects/LobeHub/lobe-chat/packages/prompts
> promptfoo eval -c promptfoo/translate/eval.yaml

Starting evaluation eval-wJN-2025-10-05T14:16:17
Running 28 test cases (up to 5 at a time)...
Evaluating [████████████████████████████████████████] 100% | 28/28 | openai:deepseek-chat "function g" content=AP

┌───────────────┬───────────────┬───────────────┬───────────────┬───────────────┬───────────────┬───────────────┐
│ content       │ from          │ to            │ [openai:gpt-… │ [openai:clau… │ [openai:gemi… │ [openai:deep… │
│               │               │               │ promptfoo/tr… │ promptfoo/tr… │ promptfoo/tr… │ promptfoo/tr… │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ Hello, how    │ en-US         │ zh-CN         │ [PASS]        │ [PASS]        │ [PASS]        │ [PASS]        │
│ are you?      │               │               │ 你好，你好吗… │ 你好，最近怎… │ 你好，你好吗… │ 你好，最近怎… │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ 你好，你怎么… │ zh-CN         │ en-US         │ [PASS] Hello, │ [PASS] Hello, │ [PASS] Hello, │ [PASS] Hello, │
│               │               │               │ how are you?  │ how are you?  │ how are you?  │ how are you   │
│               │               │               │               │               │               │ doing?        │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ Je suis       │ fr-FR         │ en-US         │ [PASS] I am   │ [PASS] I'm    │ [PASS] I am   │ [PASS] Nice   │
│ content de    │               │               │ happy to meet │ happy to meet │ pleased to    │ to meet you.  │
│ vous          │               │               │ you           │ you           │ meet you      │               │
│ rencontrer    │               │               │               │               │               │               │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ The weather   │ en-US         │ es-ES         │ [PASS] El     │ [PASS] El     │ [PASS] El     │ [PASS] Hace   │
│ is beautiful  │               │               │ tiempo está   │ tiempo está   │ tiempo es     │ un día        │
│ today         │               │               │ precioso hoy  │ hermoso hoy   │ hermoso hoy   │ precioso.     │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ I love        │ en-US         │ ja-JP         │ [PASS]        │ [PASS]        │ [PASS]        │ [PASS]        │
│ programming   │               │               │ TypeScriptで… │ プログラミン… │ 私はTypeScri… │ 私はTypeScri… │
│ with          │               │               │               │               │               │               │
│ TypeScript    │               │               │               │               │               │               │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ Machine       │ en-US         │ de-DE         │ [PASS]        │ [PASS]        │ [PASS]        │ [PASS]        │
│ learning is   │               │               │ Maschinelles  │ Machine       │ Maschinelles  │ Maschinelles  │
│ revolutioniz… │               │               │ Lernen        │ Learning      │ Lernen        │ Lernen        │
│ technology    │               │               │ revolutionie… │ revolutionie… │ revolutionie… │ revolutionie… │
│               │               │               │ die           │ die           │ die           │ die           │
│               │               │               │ Technologie.  │ Technologie   │ Technologie   │ Technologie.  │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
│ API_KEY_12345 │ en-US         │ zh-CN         │ [PASS]        │ [PASS]        │ [PASS]        │ [PASS]        │
│               │               │               │ API_KEY_12345 │ API_KEY_12345 │ API_KEY_12345 │ API_KEY_12345 │
└───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┘
==============================================================================================================
✔ Evaluation complete. ID: eval-wJN-2025-10-05T14:16:17

» Run promptfoo view to use the local web viewer
» Do you want to share this with your team? Sign up for free at https://promptfoo.app
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
==============================================================================================================
Token Usage Summary:

  Evaluation:
    Total: 3,705
    Prompt: 0
    Completion: 0
    Cached: 3,705

  Provider Breakdown:
    openai:gpt-5-mini: 1,596 (0 requests)
      (1,596 cached)
    openai:gemini-flash-latest: 780 (0 requests)
      (780 cached)
    openai:claude-3-5-haiku-latest: 728 (0 requests)
      (728 cached)
    openai:deepseek-chat: 601 (0 requests)
      (601 cached)

  Grand Total: 3,705 tokens
==============================================================================================================
Duration: 0s (concurrency: 5)
Successes: 28
Failures: 0
Errors: 0
Pass Rate: 100.00%
==============================================================================================================

> @lobechat/[email protected] test:prompts:summary /Users/arvinxx/CodeProjects/LobeHub/lobe-chat/packages/prompts
> promptfoo eval -c promptfoo/summary-title/eval.yaml

Starting evaluation eval-ptk-2025-10-05T14:16:20
Running 16 test cases (up to 5 at a time)...
Evaluating [████████████████████████████████████████] 100% | 16/16 | openai:deepseek-chat "function g" messages=[

┌────────────────────────────────┬────────────────────────────────┬────────────────────────────────┬────────────────────────────────┬────────────────────────────────┬────────────────────────────────┐
│ locale                         │ messages                       │ [openai:gpt-5-mini]            │ [openai:claude-3-5-haiku-late… │ [openai:gemini-flash-latest]   │ [openai:deepseek-chat]         │
│                                │                                │ promptfoo/summary-title/promp… │ promptfoo/summary-title/promp… │ promptfoo/summary-title/promp… │ promptfoo/summary-title/promp… │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ en-US                          │                                │ [PASS] Install Nodejs using    │ [PASS] Nodejs Installation     │ [PASS] Installing Nodejs and   │ [PASS] Installing Nodejs with  │
│                                │                                │ NVM                            │ Guide Using Version Manager    │ NVM                            │ Version Manager                │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ zh-CN                          │                                │ [PASS] 蛋炒饭制作步骤          │ [PASS]                         │ [PASS] 如何做蛋炒饭            │ [PASS] 蛋炒饭制作步骤指南      │
│                                │                                │                                │ 简单美味家常蛋炒饭制作教程     │                                │                                │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ en-US                          │                                │ [PASS] Fixing Python NoneType  │ [PASS] Python Error Debugging  │ [PASS] Python NoneType split   │ [PASS] Debugging a Python      │
│                                │                                │ split error                    │ Attribute Error with Split     │ error explanation              │ NoneType AttributeError        │
│                                │                                │                                │ Method                         │                                │                                │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ es-ES                          │                                │ [PASS] Consulta sobre el       │ [PASS] Consulta del pronóstico │ [PASS] Consulta del tiempo     │ [PASS] Consulta del clima      │
│                                │                                │ tiempo                         │ meteorológico del día          │ actual                         │ actual                         │
└────────────────────────────────┴────────────────────────────────┴────────────────────────────────┴────────────────────────────────┴────────────────────────────────┴────────────────────────────────┘
=================================================================================================================================================================================================
✔ Evaluation complete. ID: eval-ptk-2025-10-05T14:16:20

» Run promptfoo view to use the local web viewer
» Do you want to share this with your team? Sign up for free at https://promptfoo.app
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
=================================================================================================================================================================================================
Token Usage Summary:

  Evaluation:
    Total: 4,860
    Prompt: 0
    Completion: 0
    Cached: 4,860

  Provider Breakdown:
    openai:gemini-flash-latest: 1,689 (0 requests)
      (1,689 cached)
    openai:gpt-5-mini: 1,684 (0 requests)
      (1,684 cached)
    openai:claude-3-5-haiku-latest: 822 (0 requests)
      (822 cached)
    openai:deepseek-chat: 665 (0 requests)
      (665 cached)

  Grading:
    Total: 5,730
    Cached: 5,730

  Grand Total: 10,590 tokens
=================================================================================================================================================================================================
Duration: 0s (concurrency: 5)
Successes: 16
Failures: 0
Errors: 0
Pass Rate: 100.00%
=================================================================================================================================================================================================

> @lobechat/[email protected] test:prompts:lang /Users/arvinxx/CodeProjects/LobeHub/lobe-chat/packages/prompts
> promptfoo eval -c promptfoo/language-detection/eval.yaml

Starting evaluation eval-DKk-2025-10-05T14:16:23
Running 24 test cases (up to 5 at a time)...
Evaluating [████████████████████████████████████████] 100% | 24/24 | openai:deepseek-chat "function g" content=こん

┌──────────────────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┐
│ content                              │ [openai:gpt-5-mini]                  │ [openai:claude-3-5-haiku-latest]     │ [openai:gemini-flash-latest]         │ [openai:deepseek-chat]               │
│                                      │ promptfoo/language-detection/prompt… │ promptfoo/language-detection/prompt… │ promptfoo/language-detection/prompt… │ promptfoo/language-detection/prompt… │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Hello, how are you today? I hope     │ [PASS] en-US                         │ [PASS] en-US                         │ [PASS] en-US                         │ [PASS] en-US                         │
│ you're having a great day!           │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Bonjour, comment allez-vous?         │ [PASS] fr-FR                         │ [PASS] fr-FR                         │ [PASS] fr-FR                         │ [PASS] fr-FR                         │
│ J'espère que vous passez une         │                                      │                                      │                                      │                                      │
│ excellente journée!                  │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ 你好，你今天怎么样？希望你过得愉快！ │ [PASS] zh-CN                         │ [PASS] zh-CN                         │ [PASS] zh-CN                         │ [PASS] zh-CN                         │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Hola, ¿cómo estás hoy? ¡Espero que   │ [PASS] es-ES                         │ [PASS] es-ES                         │ [PASS] es-ES                         │ [PASS] es-ES                         │
│ tengas un gran día!                  │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Привет, как дела сегодня? Надеюсь, у │ [PASS] ru-RU                         │ [PASS] ru-RU                         │ [PASS] ru-RU                         │ [PASS] ru-RU                         │
│ тебя отличный день!                  │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ こんにちは、今日はいかがですか？素…  │ [PASS] ja-JP                         │ [PASS] ja-JP                         │ [PASS] ja-JP                         │ [PASS] ja-JP                         │
└──────────────────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┘
=================================================================================================================================================================================================
✔ Evaluation complete. ID: eval-DKk-2025-10-05T14:16:23

» Run promptfoo view to use the local web viewer
» Do you want to share this with your team? Sign up for free at https://promptfoo.app
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
=================================================================================================================================================================================================
Token Usage Summary:

  Evaluation:
    Total: 2,514
    Prompt: 0
    Completion: 0
    Cached: 2,514

  Provider Breakdown:
    openai:gpt-5-mini: 1,100 (0 requests)
      (1,100 cached)
    openai:claude-3-5-haiku-latest: 641 (0 requests)
      (641 cached)
    openai:deepseek-chat: 403 (0 requests)
      (403 cached)
    openai:gemini-flash-latest: 370 (0 requests)
      (370 cached)

  Grading:
    Total: 7,952
    Cached: 7,952

  Grand Total: 10,466 tokens
=================================================================================================================================================================================================
Duration: 0s (concurrency: 5)
Successes: 24
Failures: 0
Errors: 0
Pass Rate: 100.00%
=================================================================================================================================================================================================

> @lobechat/[email protected] test:prompts:emoji /Users/arvinxx/CodeProjects/LobeHub/lobe-chat/packages/prompts
> promptfoo eval -c promptfoo/emoji-picker/eval.yaml

Starting evaluation eval-JKO-2025-10-05T14:16:26
Running 68 test cases (up to 5 at a time)...
Evaluating [████████████████████████████████████████] 100% | 68/68 | openai:gemini-flash-latest "function g" content=Я 

┌──────────────────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┐
│ content                              │ [openai:gpt-5-mini]                  │ [openai:claude-3-5-haiku-latest]     │ [openai:gemini-flash-latest]         │ [openai:deepseek-chat]               │
│                                      │ promptfoo/emoji-picker/prompt.ts     │ promptfoo/emoji-picker/prompt.ts     │ promptfoo/emoji-picker/prompt.ts     │ promptfoo/emoji-picker/prompt.ts     │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ I just got a promotion at work! I'm  │ [PASS] 🥳                            │ [PASS] 🎉                            │ [PASS] 📈                            │ [PASS] 🎉                            │
│ so excited!                          │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ My dog passed away yesterday. I'm    │ [PASS] 😢                            │ [PASS] 😢                            │ [PASS] 😭                            │ [PASS] 😢                            │
│ really sad.                          │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Can you help me with this math       │ [PASS] 🧮                            │ [PASS] 🧮                            │ [PASS] ➗                            │ [PASS] 🧮                            │
│ problem?                             │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ I'm going on vacation to Hawaii next │ [PASS] 🌺                            │ [PASS] 🌴                            │ [PASS] 🏝️                            │ [PASS] 🏝️                            │
│ week!                                │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ I'm learning to cook Italian food    │ [PASS] 🍝                            │ [PASS] 🍝                            │ [PASS] 🍝                            │ [PASS] 🍝                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Technical documentation about API    │ [PASS] 🔌                            │ [PASS] 📝                            │ [PASS] 📄                            │ [PASS] 📡                            │
│ endpoints                            │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ 我刚刚升职了！太激动了！             │ [PASS] 🥳                            │ [PASS] 🎉                            │ [PASS] 🥳                            │ [PASS] 🎉                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ 我的猫咪昨天去世了，我很难过         │ [PASS] 😿                            │ [PASS] 😢                            │ [PASS] 😿                            │ [PASS] 😢                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ 我正在学习做日本料理                 │ [PASS] 🍣                            │ [PASS] 🍣                            │ [PASS] 🍣                            │ [PASS] 🍣                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ ¡Me voy de vacaciones a la playa la  │ [PASS] 🏖️                            │ [PASS] 🏖️                            │ [PASS] 🏖️                            │ [PASS] 🏖️                            │
│ próxima semana!                      │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Estoy estudiando para mi examen de   │ [PASS] 🧮                            │ [PASS] 🧮                            │ [PASS] 📐                            │ [PASS] 📊                            │
│ matemáticas                          │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Je viens de terminer mon marathon!   │ [PASS] 🏃                            │ [PASS] 🏃                            │ [PASS] 🏃                            │ [PASS] 🏃‍♂️                            │
│ Je suis épuisé mais heureux          │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ J'apprends à jouer de la guitare     │ [PASS] 🎸                            │ [PASS] 🎸                            │ [PASS] 🎸                            │ [PASS] 🎸                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ 新しいプロジェクトが始まりました！…  │ [PASS] 🚀                            │ [PASS] 💪                            │ [PASS] 💪                            │ [PASS] 💼                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ 桜が咲いて本当に綺麗です             │ [PASS] 🌸                            │ [PASS] 🌸                            │ [PASS] 🌸                            │ [PASS] 🌸                            │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Ich habe gerade ein neues Auto       │ [PASS] 🚗                            │ [PASS] 🚗                            │ [PASS] 🚗                            │ [PASS] 🚗                            │
│ gekauft!                             │                                      │                                      │                                      │                                      │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ Я люблю читать книги по вечерам      │ [PASS] 📖                            │ [PASS] 📖                            │ [PASS] 📖                            │ [PASS] 📖                            │
└──────────────────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┘
=================================================================================================================================================================================================
✔ Evaluation complete. ID: eval-JKO-2025-10-05T14:16:26

» Run promptfoo view to use the local web viewer
» Do you want to share this with your team? Sign up for free at https://promptfoo.app
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
=================================================================================================================================================================================================
Token Usage Summary:

  Evaluation:
    Total: 22,595
    Prompt: 0
    Completion: 0
    Cached: 22,595

  Provider Breakdown:
    openai:gpt-5-mini: 8,217 (0 requests)
      (8,217 cached)
    openai:claude-3-5-haiku-latest: 5,478 (0 requests)
      (5,478 cached)
    openai:deepseek-chat: 4,727 (0 requests)
      (4,727 cached)
    openai:gemini-flash-latest: 4,173 (0 requests)
      (4,173 cached)

  Grading:
    Total: 24,020
    Cached: 24,020

  Grand Total: 46,615 tokens
=================================================================================================================================================================================================
Duration: 0s (concurrency: 5)
Successes: 68
Failures: 0
Errors: 0
Pass Rate: 100.00%
=================================================================================================================================================================================================

> @lobechat/[email protected] test:prompts:qa /Users/arvinxx/CodeProjects/LobeHub/lobe-chat/packages/prompts
> promptfoo eval -c promptfoo/knowledge-qa/eval.yaml

Starting evaluation eval-RHj-2025-10-05T14:16:28
Running 28 test cases (up to 5 at a time)...
Evaluating [████████████████████████████████████████] 100% | 28/28 | openai:gemini-flash-latest "function g" context= q

┌────────────────────────────────┬────────────────────────────────┬────────────────────────────────┬────────────────────────────────┬────────────────────────────────┬────────────────────────────────┐
│ context                        │ query                          │ [openai:gpt-5-mini]            │ [openai:claude-3-5-haiku-late… │ [openai:gemini-flash-latest]   │ [openai:deepseek-chat]         │
│                                │                                │ promptfoo/knowledge-qa/prompt… │ promptfoo/knowledge-qa/prompt… │ promptfoo/knowledge-qa/prompt… │ promptfoo/knowledge-qa/prompt… │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ React is a JavaScript library  │ What is React and who          │ [PASS] - What React is: React  │ [PASS] Based on the provided   │ [PASS] React is a widely used  │ [PASS] Based on the provided   │
│ for building user interfaces.  │ developed it?                  │ is a JavaScript library for    │ context, here's a              │ and popular technology for     │ context:                       │
│ It was developed by Facebook   │                                │ building user interfaces. It   │ comprehensive answer about     │ front-end development.         │ **What is React?**             │
│ and is now maintained by       │                                │ uses a virtual DOM to          │ React:                         │ Based on the provided context  │ React is a JavaScript library  │
│ Facebook and the community.    │                                │ efficiently update and render  │ ### React Overview             │ and general knowledge:         │ for building user interfaces.  │
│ React uses a virtual DOM to    │                                │ components and features a      │ React is a popular JavaScript  │ ### What is React?             │ **Who developed it?**          │
│ efficiently update and render  │                                │ component-based architecture,  │ library specifically designed  │ React is an open-source        │ React was developed by         │
│ components. Key features       │                                │ JSX syntax, and state          │ for building user interfaces.  │ **JavaScript library**         │ Facebook.                      │
│ include component-base...      │                                │ management via hooks.          │ Here are the key details:      │ specifically utilized for      │ Additionally, the context      │
│                                │                                │ - Who develope...              │ #### Development and Ownership │ **building user interfaces...  │ notes that it is now           │
│                                │                                │                                │ - **...                        │                                │ maintained by both Facebook    │
│                                │                                │                                │                                │                                │ and ...                        │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ TypeScript is a strongly typed │ How does TypeScript relate to  │ [PASS] The provided context is │ [PASS] Based on the provided   │ [PASS] TypeScript (TS) relates │ [PASS] Based on the provided   │
│ programming language that      │ JavaScript?                    │ directly relevant: it says     │ context, I'll explain how      │ to JavaScript (JS) primarily   │ context, TypeScript is         │
│ builds on JavaScript by adding │                                │ TypeScript is a strongly typed │ TypeScript relates to          │ as a structural extension,     │ fundamentally related to       │
│ static type definitions. It    │                                │ language that builds on        │ JavaScript:                    │ acting as a superset that      │ JavaScript as an extension or  │
│ was developed by Microsoft.    │                                │ JavaScript and that TypeScript │ ### TypeScript and JavaScript  │ provides enhanced developer    │ superset built directly on top │
│ TypeScript code compiles to    │                                │ code compiles to JavaScript    │ Relationship                   │ tooling and type safety.       │ of it.                         │
│ clean, simple JavaScript code  │                                │ which can run anywhere         │ 1. **Superset of JavaScript**  │ Based on the provided context  │ Here's a breakdown of their    │
│ which runs on any browser,     │                                │ JavaScript runs.               │    - TypeScript is fundamenta… │ and general knowledge, here is │ relationship:                  │
│ Node.js environmen...          │                                │ Summary of the relationship    │ built on top of JavaScript     │ a detailed explanation of ...  │ *   **Foundation:** TypeScript │
│                                │                                │ - TypeScript ...               │    - It extends JavaScript by… │                                │ is built on JavaScript. This   │
│                                │                                │                                │                                │                                │ means...                       │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ Node.js是一个基于Chrome        │ Node.js有什么特点？            │ [PASS]                         │ [PASS]                         │ [PASS]                         │ [PASS]                         │
│ V8引擎的JavaScript运行时环境…  │                                │ 正如提供的上下文所述，Node.js  │ 根据提供的上下文，我来详细解答 │ Node.js是一种非常流行的技术，… │ 根据提供的上下文和我对Node.js… │
│                                │                                │ 是基于 Chrome V8 引擎的        │ Node.js 的特点：               │ 根据提供的上下文信息并结合相…  │ ## Node.js的核心特点           │
│                                │                                │ JavaScript                     │ ### Node.js 主要特点           │ ### 1. 基于 Chrome V8          │ ### 1. **基于Chrome V8引擎**   │
│                                │                                │ 运行时环境，具有事件驱动和非…  │ 1. **运行时环境**              │ 引擎的高性能运行时环境         │ - 使用Google Chrome浏览器的V8  │
│                                │                                │ I/O 模型，使其轻量且高效，npm  │ - 基于 Chrome 的 V8 引擎       │ Node.js是基于**Chrome          │ JavaScript引擎                 │
│                                │                                │ 是其庞大的包管理生态。基于此…  │ - 可以在服务器端运行           │ V8引擎**的JavaScript运行时环…  │ -                              │
│                                │                                │ Node.js 的主要特点：           │ JavaScript                     │ ### 2. 事件驱动与非阻塞 I/O    │ 提供高性能的JavaScript执行环境 │
│                                │                                │ - 基于 V8 引擎                 │ 2. **编程模型特点**            │ 模型                           │ ### 2. **事件驱动架构**        │
│                                │                                │   - 使用 Google 的 V8 JavaScr… │ -                              │ 这是Node.js最核心的特点...     │ - 采用事件循环机制处理并发请求 │
│                                │                                │ 引擎，执行速度快、性能好。     │ **事件驱动**：通过事件和回调…  │                                │ - 能够高效处理大量并发连接     │
│                                │                                │ -                              │ - **非阻塞 I/O                 │                                │ ### 3. **非阻塞I/O模型**       │
│                                │                                │ 事件驱动（Event-driven）与单…  │ 模型**：能够高效处理并发请求   │                                │ - I/O操作不会阻塞后续代码执行  │
│                                │                                │ loop）                         │ 3. **生态系统优势**            │                                │ - 通过回调函数处理异步操作结果 │
│                                │                                │   -                            │ - npm（Node Package            │                                │ - 提高...                      │
│                                │                                │ 采用事件和回调机制处理并发，…  │ Manager）是世界上最大的开源库… │                                │                                │
│                                │                                │                                │ - 提供了丰富的第三方库和工具   │                                │                                │
│                                │                                │                                │ ...                            │                                │                                │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ Docker is a containerization   │ How can I deploy my app with   │ [PASS] Context check           │ [PASS] Based on the provided   │ [PASS] Deploying your          │ [PASS] The provided context is │
│ platform that allows           │ Docker?                        │ - The provided context         │ context about Docker, I'll     │ application using Docker is a  │ about Docker's basic concept   │
│ developers to package          │                                │ explains what Docker is: a     │ help you deploy your app with  │ highly effective way to ensure │ of containerization, which is  │
│ applications and their         │                                │ containerization platform that │ Docker by combining the        │ consistency, portability, and  │ relevant to your deployment    │
│ dependencies into lightweight, │                                │ packages applications and      │ context information and my     │ scalability, as Docker allows  │ question.                      │
│ portable containers.           │                                │ their dependencies into        │ expertise:                     │ you to package the application │ Since the context establishes  │
│ Containers share the OS kernel │                                │ lightweight, portable          │ ## Docker Deployment Steps     │ and all its dependencies into  │ the topic relevance but        │
│ but run in isolated user       │                                │ containers that share the OS   │ ### 1. Prepare Your            │ a lightweight, isolated        │ doesn't cover deployment       │
│ spaces. This makes             │                                │ kernel while running in        │ Application                    │ container.                     │ specifics, I'll provide a      │
│ applications more consist...   │                                │ isolated user spaces. This ... │ - Ensure your application is   │ Based on the u...              │ comprehensive guide ...        │
│                                │                                │                                │ ready to be cont...            │                                │                                │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ GraphQL is a query language    │ What are the benefits of using │ [PASS] The provided context is │ [PASS] Based on the provided   │ [PASS] Based on the provided   │ [PASS] Based on the provided   │
│ for APIs and a runtime for     │ GraphQL over REST?             │ about GraphQL and states that  │ context, I'll explain the      │ context and general knowledge  │ context, I can identify one    │
│ fulfilling those queries with  │                                │ it “allows clients to request  │ benefits of GraphQL over REST: │ regarding API architecture,    │ key benefit of GraphQL over    │
│ existing data. Unlike REST     │                                │ exactly the data they need in  │ ### Key Benefits of GraphQL    │ GraphQL offers several         │ REST and will supplement this  │
│ APIs that require multiple     │                                │ a single request” and          │ 1. **Precise Data Fetching**   │ significant advantages when    │ with additional benefits from  │
│ requests to different          │                                │ contrasts this with REST. This │    - GraphQL allows clients to │ compared to traditional REST   │ my general knowledge.          │
│ endpoints, GraphQL allows      │                                │ is directly relevant.          │ request exactly the data they  │ APIs.                          │ ### Benefits of Using GraphQL  │
│ clients to request exactly the │                                │ Benefits of using GraphQL over │ need                           │ The core benefits stem from    │ over REST                      │
│ data they need in a single     │                                │ REST                           │    - Unlike REST APIs that     │ GraphQL's ability to precisely │ From the context:              │
│ re...                          │                                │ - Precise data fetch...        │ require multiple...            │ control the data fetc...       │ - **Single Request Efficien... │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ Machine learning algorithms    │ Can you explain blockchain     │ [PASS] - The provided context  │ [PASS] Based on the context    │ [PASS] The provided context is │ [PASS] The provided context is │
│ can be categorized into        │ technology?                    │ is about machine learning      │ provided, which discusses      │ about machine learning         │ about **machine learning       │
│ supervised, unsupervised, and  │                                │ (supervised, unsupervised, and │ machine learning algorithms, I │ algorithms (specifically       │ algorithms and their           │
│ reinforcement learning.        │                                │ reinforcement learning).       │ observe that:                  │ supervised, unsupervised, and  │ categorization**.              │
│ Supervised learning uses       │                                │ - The provided context does    │ * The context is about machine │ reinforcement learning).       │ The provided context does not  │
│ labeled data to train models,  │                                │ not contain information about  │ learning types and techniques  │ The provided context does not  │ contain information about      │
│ unsupervised learning finds    │                                │ blockchain technology.         │ * The context does NOT contain │ contain information about      │ **blockchain technology**.     │
│ patterns in unlabeled data,    │                                │ I can explain blockchain       │ information about blockchain   │ blockchain technology.         │ Therefore, I cannot answer the │
│ and reinforcement learning     │                                │ technology if you’d like—would │ technology                     │                                │ question based on the given    │
│ lea...                         │                                │ you ...                        │ **The provided contex...       │                                │ context.                       │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│                                │ How do I set up a web server?  │ [PASS] Below is a practical,   │ [PASS] Here's a step-by-step   │ [PASS] Setting up a web server │ [PASS] Absolut! Hier ist eine  │
│                                │                                │ step‑by‑step guide to set up a │ guide to setting up a web      │ involves several steps,        │ allgemeine Anleitung zur       │
│                                │                                │ basic and secure web server. I │ server:                        │ ranging from choosing the      │ Einrichtung eines Webservers,  │
│                                │                                │ cover common choices (Nginx,   │ ### 1. Choose a Web Server     │ hardware/hosting to installing │ die die grundlegenden Schritte │
│                                │                                │ Apache, IIS), examples for     │ Software                       │ the necessary software and     │ und Konzepte erklärt.          │
│                                │                                │ Linux, how to serve files or   │ - **Popular options:**         │ configuring it to serve web    │ Die Einrichtung kann je nach   │
│                                │                                │ apps, DNS and SSL, and         │   - Apache HTTP Server         │ content.                       │ Betriebssystem, gewünschter    │
│                                │                                │ important security/maintenance │   - Nginx                      │ Here is a general guide on how │ Komplexität und Technologie    │
│                                │                                │ tips.                          │   - Microsoft IIS              │ to set up a web server:        │ (z.B. statische Seiten vs.     │
│                                │                                │ 1) Decide where t...           │   - LiteSpeed                  │ ---                            │ Pyt...                         │
│                                │                                │                                │ ### 2. Select an Operating     │ ## 1. Choose You...            │                                │
│                                │                                │                                │ System                         │                                │                                │
│                                │                                │                                │ - **Common choices:**          │                                │                                │
│                                │                                │                                │   - Linux...                   │                                │                                │
└────────────────────────────────┴────────────────────────────────┴────────────────────────────────┴────────────────────────────────┴────────────────────────────────┴────────────────────────────────┘
=================================================================================================================================================================================================
✔ Evaluation complete. ID: eval-RHj-2025-10-05T14:16:28

» Run promptfoo view to use the local web viewer
» Do you want to share this with your team? Sign up for free at https://promptfoo.app
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
=================================================================================================================================================================================================
Token Usage Summary:

  Evaluation:
    Total: 24,568
    Prompt: 0
    Completion: 0
    Cached: 24,568

  Provider Breakdown:
    openai:gpt-5-mini: 8,812 (0 requests)
      (8,812 cached)
    openai:gemini-flash-latest: 7,663 (0 requests)
      (7,663 cached)
    openai:deepseek-chat: 4,283 (0 requests)
      (4,283 cached)
    openai:claude-3-5-haiku-latest: 3,810 (0 requests)
      (3,810 cached)

  Grading:
    Total: 23,080
    Cached: 23,080

  Grand Total: 47,648 tokens
=================================================================================================================================================================================================
Duration: 1s (concurrency: 5)
Successes: 28
Failures: 0
Errors: 0
Pass Rate: 100.00%
=================================================================================================================================================================================================

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] 137 - Promptfoo Integration #9571

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

[RFC] 137 - Promptfoo Integration #9571

Uh oh!

Uh oh!

arvinxx Oct 5, 2025 Maintainer

概述

动机

为什么需要提示词测试？

为什么选择 Promptfoo？

技术规范

目录结构

测试配置文件格式

eval.yaml 示例

prompt.ts 示例

断言类型

1. llm-rubric - LLM 评判

2. contains / contains-any - 包含检查

3. not-contains - 排除检查

4. javascript - 自定义逻辑

工作流程

1. 创建新的提示词测试

2. 运行测试

3. 查看结果

4. 迭代优化

最佳实践

测试用例设计

1. 覆盖多种场景

2. 多语言测试

3. 多模型验证

断言设计

1. 组合使用多种断言

2. 指定评判模型

性能优化

1. 使用缓存

2. 并发控制

3. 成本控制

CI/CD 集成

GitHub Actions 示例

进展

Replies: 1 comment

Uh oh!

Uh oh!

arvinxx Oct 5, 2025 Maintainer Author

arvinxx
Oct 5, 2025
Maintainer

arvinxx
Oct 5, 2025
Maintainer Author