07 — Context Compaction 作為成本節約的手段

概述

Context Compaction（對話壓縮）是 Claude Code 最重要的成本控制機制之一。當對話 context 超出模型 context window 的特定比例時，自動將歷史對話壓縮為結構化摘要，減少後續 API 呼叫的 input token 數量，從而大幅降低成本。

一、Compact Prompt 設計（compact/prompt.ts）

三種 Compact 模式

1. 全量壓縮（getCompactPrompt）

對整個對話歷史執行壓縮：

export function getCompactPrompt(customInstructions?: string): string {
  let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT
  if (customInstructions?.trim() !== '') {
    prompt += `\n\nAdditional Instructions:\n${customInstructions}`
  }
  prompt += NO_TOOLS_TRAILER
  return prompt
}

2. 部分壓縮 - 正向（direction: 'from'）

保留早期訊息，壓縮最近的訊息：

const PARTIAL_COMPACT_PROMPT = `Your task is to create a detailed summary of the RECENT portion of the conversation...`
// 摘要只涵蓋 "recent messages"，早期訊息保持原樣

3. 部分壓縮 - 反向（direction: 'up_to'）

壓縮早期訊息，保留最近訊息：

const PARTIAL_COMPACT_UP_TO_PROMPT = `Your task is to create a detailed summary of this conversation.
This summary will be placed at the start of a continuing session; newer messages that build on this
context will follow after your summary...`
// 用於 cache-sharing fork path，摘要會置於對話開頭

二、NO_TOOLS_PREAMBLE — 成本最佳化的細節

const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.

- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.
`

設計背景（程式碼注釋揭示）：

Aggressive no-tools preamble. The cache-sharing fork path inherits the parent's full tool set (required for cache-key match), and on Sonnet 4.6+ adaptive-thinking models the model sometimes attempts a tool call despite the weaker trailer instruction. With maxTurns: 1, a denied tool call means no text output → falls through to the streaming fallback (2.79% on 4.6 vs 0.01% on 4.5).

cache-sharing fork 繼承了父 agent 的完整工具集（維持快取 key 匹配），但 Sonnet 4.6 的 adaptive thinking 偶爾會嘗試呼叫工具。在 maxTurns: 1 下工具呼叫被拒絕 = 無文字輸出 = 回退到備援路徑，在 4.6 上發生率 2.79%（vs 4.5 的 0.01%）。

此 preamble 放在最前面以強調重要性，搭配結尾的 NO_TOOLS_TRAILER 再次提醒，將失敗率從 2.79% 大幅降低。

三、壓縮摘要的結構

BASE_COMPACT_PROMPT 要求 9 個標準段落：

Primary Request and Intent — 用戶需求
Key Technical Concepts — 技術概念清單
Files and Code Sections — 關鍵檔案與程式碼片段
Errors and fixes — 錯誤與修復記錄
Problem Solving — 問題解決過程
All user messages — 所有用戶訊息（非 tool result）
Pending Tasks — 待完成任務
Current Work — 當前工作狀態
Optional Next Step — 下一步行動（需引用原文）

<analysis> 區塊設計：

export function formatCompactSummary(summary: string): string {
  // 移除 <analysis> 區塊（這是 scratchpad，對後續 context 無資訊價值）
  formattedSummary = formattedSummary.replace(
    /<analysis>[\s\S]*?<\/analysis>/,
    '',
  )
  // ...
}

<analysis> 是模型的「草稿空間」，提高摘要品質但不會出現在最終注入 context 的摘要中，因此不消耗後續 API 呼叫的 input tokens。

四、壓縮後的 Context 重建

export function getCompactUserSummaryMessage(
  summary: string,
  suppressFollowUpQuestions?: boolean,
  transcriptPath?: string,
  recentMessagesPreserved?: boolean,
): string {
  const formattedSummary = formatCompactSummary(summary)

  let baseSummary = `This session is being continued from a previous conversation that ran out of context. The summary below covers the earlier portion of the conversation.\n\n${formattedSummary}`

  // 完整 transcript 的路徑（供查詢細節）
  if (transcriptPath) {
    baseSummary += `\n\nIf you need specific details from before compaction (like exact code snippets, error messages, or content you generated), read the full transcript at: ${transcriptPath}`
  }

  // 如果保留了最近訊息則標記
  if (recentMessagesPreserved) {
    baseSummary += `\n\nRecent messages are preserved verbatim.`
  }

  // 非互動模式：直接繼續，不提問
  if (suppressFollowUpQuestions) {
    return `${baseSummary}\nContinue the conversation from where it left off without asking the user any further questions...`
  }
}

五、Proactive Mode 整合

// Dead code elimination: conditional import for proactive mode
const proactiveModule =
  feature('PROACTIVE') || feature('KAIROS')
    ? require('../../proactive/index.js')
    : null

// Proactive mode 下的 compaction 繼續行為
if ((feature('PROACTIVE') || feature('KAIROS')) && proactiveModule?.isProactiveActive()) {
  continuation += `
You are running in autonomous/proactive mode. This is NOT a first wake-up — you were already working autonomously before compaction. Continue your work loop: pick up where you left off based on the summary above. Do not greet the user or ask what to work on.`
}

Proactive 模式（自主執行模式）在壓縮後明確指示模型繼續自主工作，不重新問用戶需求。

六、與快取的交互（成本最佳化的核心）

promptCacheBreakDetection.ts 中的 compaction 通知：

export function notifyCompaction(querySource: QuerySource, agentId?: AgentId): void {
  const key = getTrackingKey(querySource, agentId)
  const state = key ? previousStateBySource.get(key) : undefined
  if (state) {
    state.prevCacheReadTokens = null  // 重置 cache break 偵測基線
  }
}

壓縮後 prevCacheReadTokens 重置為 null，避免將正常的 cache read 下降誤報為 cache break。

cache-sharing fork path：

Compact 請求繼承父 agent 的完整工具集和系統提示，確保與主 thread 共享同一份 prompt cache，避免為 compact 操作單獨付出 cache creation 費用。

七、成本節約機制分析

直接成本節約

假設對話已累積 150K tokens：

壓縮後摘要約 5-10K tokens
後續每次 API 呼叫節省 140K input tokens
以 Sonnet 定價：每次節省 $0.42（140K × $3/1M）
如有快取命中：每次節省 $0.04x（140K × $0.30/1M）

間接成本節約

避免 context window 超限：超限會導致整個 session 無法繼續，不得不重新開始
維持快取命中率：較短的 context 更容易被完整快取
減少重試成本：較小的請求更不容易超時

壓縮本身的成本

壓縮操作需要一次完整的 API 呼叫（送入完整 context 換出摘要）
以 150K tokens 的 context 為例，壓縮成本約 $0.45（Sonnet input）+ 少量 output
此後每次對話節省 $0.42，1-2 輪對話後回本

八、Custom Instructions 整合

export function getCompactPrompt(customInstructions?: string): string {
  let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT
  if (customInstructions && customInstructions.trim() !== '') {
    prompt += `\n\nAdditional Instructions:\n${customInstructions}`
  }
  prompt += NO_TOOLS_TRAILER
  return prompt
}

用戶可在 CLAUDE.md 中設定壓縮指令，例如：

## Compact Instructions
When summarizing the conversation focus on typescript code changes and also remember the mistakes you made and how you fixed them.

九、小結

功能	說明
壓縮模式	全量 / 部分-from / 部分-up_to
成本節約原理	將歷史 context 壓縮為結構化摘要，大幅減少後續 input tokens
快取整合	cache-sharing fork 繼承父 tool set；compaction 後重置 cache break 基線
品質保障	`<analysis>` scratchpad 提升摘要品質但不留在 context 中
No-tools 設計	Preamble + Trailer 雙重強調，防止 Sonnet 4.6 adaptive thinking 觸發工具呼叫
回本時間	約 1-2 輪對話後節省成本超過壓縮操作本身的費用

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

07 — Context Compaction 作為成本節約的手段

概述

一、Compact Prompt 設計（compact/prompt.ts）

三種 Compact 模式

二、NO_TOOLS_PREAMBLE — 成本最佳化的細節

三、壓縮摘要的結構

四、壓縮後的 Context 重建

五、Proactive Mode 整合

六、與快取的交互（成本最佳化的核心）

七、成本節約機制分析

直接成本節約

間接成本節約

壓縮本身的成本

八、Custom Instructions 整合

九、小結

FilesExpand file tree

07-compaction-as-cost-saving.md

Latest commit

History

07-compaction-as-cost-saving.md

File metadata and controls

07 — Context Compaction 作為成本節約的手段

概述

一、Compact Prompt 設計（compact/prompt.ts）

三種 Compact 模式

二、NO_TOOLS_PREAMBLE — 成本最佳化的細節

三、壓縮摘要的結構

四、壓縮後的 Context 重建

五、Proactive Mode 整合

六、與快取的交互（成本最佳化的核心）

七、成本節約機制分析

直接成本節約

間接成本節約

壓縮本身的成本

八、Custom Instructions 整合

九、小結