You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Example: `WAVE_PROMPT_CACHE_REGEX="claude|qwen"` matches both claude and qwen models
122
122
- Invalid regex patterns fall back to simple "claude" matching
123
123
124
-
**Note**: While`supportsPromptCaching` controls cache_control marker injection (only for Claude-like models), cache token extraction from usage applies to ALL models. Non-Claude models (Gemini, DeepSeek) return cache data via `prompt_tokens_details`which is an OpenAI-standard field.
124
+
**Scope of`supportsPromptCaching`**: This function gates ONLY cache_control marker injection (adding `cache_control: {type: "ephemeral"}` to messages and tool definitions). It does NOT gate cache token extraction from usage responses — that applies to ALL models, since `prompt_tokens_details` is an OpenAI-standard field returned by Gemini, DeepSeek, and others.
**Integration Point**: After line 183 (`openaiMessages` construction)
73
73
74
+
> **Important**: Cache control has two distinct concerns:
75
+
> 1.**Marker injection** (adding `cache_control: {type: "ephemeral"}` to messages/tools) — gated by `supportsPromptCaching`, only for Claude-like models
76
+
> 2.**Cache token extraction** (reading cache metrics from usage responses) — NOT gated, applies to all models
77
+
74
78
```typescript
75
-
//Add before createParams construction (line ~195)
79
+
//Marker injection: gated by supportsPromptCaching
76
80
if (supportsPromptCaching(model||modelConfig.model)) {
77
81
openaiMessages=transformMessagesForClaudeCache(
78
82
openaiMessages,
@@ -86,7 +90,7 @@ if (supportsPromptCaching(model || modelConfig.model)) {
86
90
}
87
91
```
88
92
89
-
**Usage Extension**: Modify usage processing to extract cache metrics from all models
93
+
**Usage Extension**: Cache token extraction applies to all models (no gate)
Copy file name to clipboardExpand all lines: specs/021-prompt-cache-control/research.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,8 @@
30
30
31
31
**Alternatives considered**:
32
32
- Custom cache duration: Not supported by Claude API
33
-
- Universal caching: Rejected due to cost and limited benefit for non-Claude models
33
+
- Universal cache_control marker injection: Rejected due to cost and limited benefit for non-Claude models (only Claude API supports `cache_control: {type: "ephemeral"}` markers)
34
+
- Universal cache token extraction: Accepted — `prompt_tokens_details` is an OpenAI-standard field; extracting cache metrics from all models' usage is zero-cost and provides visibility
34
35
- Image content caching: Not supported by Claude API
Copy file name to clipboardExpand all lines: specs/021-prompt-cache-control/spec.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ As a user who switches between permission modes (e.g., default → plan → acce
93
93
94
94
### Edge Cases
95
95
96
-
-**Edge Case 1**: Model name detection MUST be case-insensitive ("Claude-3-Sonnet" and "claude-3-sonnet" both trigger caching)
96
+
-**Edge Case 1**: Model name detection MUST be case-insensitive ("Claude-3-Sonnet" and "claude-3-sonnet" both trigger cache_control marker injection). Cache token extraction from usage applies regardless of model name.
97
97
-**Edge Case 2**: Mixed content messages MUST apply cache_control only to text content parts, preserving images unchanged
98
98
-**Edge Case 3**: Empty conversation history MUST skip user message caching, apply system message caching only
99
99
-**Edge Case 4**: Streaming and non-streaming requests MUST apply identical cache_control transformation logic
@@ -104,14 +104,14 @@ As a user who switches between permission modes (e.g., default → plan → acce
104
104
105
105
### Functional Requirements
106
106
107
-
-**FR-001**: System MUST detect cache-supporting models using the `WAVE_PROMPT_CACHE_REGEX` environment variable (default: "claude"), which allows configurable regex patterns for model matching
107
+
-**FR-001**: System MUST detect cache-supporting models for cache_control marker injection using the `WAVE_PROMPT_CACHE_REGEX` environment variable (default: "claude"), which allows configurable regex patterns for model matching. This gate controls ONLY the injection of `cache_control: {type: "ephemeral"}` markers into messages and tool definitions — it does NOT gate cache token extraction from usage responses, which applies to all models.
108
108
-**FR-002**: System MUST add cache_control markers with type "ephemeral" to the first system message when using Claude models. This ensures core instructions are always cached even if reminders are added later. The system prompt MUST remain constant across plan mode transitions — plan mode instructions are injected as `<system-reminder>` user messages rather than system prompt changes to preserve the cached system prompt prefix. The `<env>` section's `Primary working directory` field MUST use the immutable `originalWorkdir` (set once at session start) rather than the dynamic `workdir` (which tracks `cd` changes), so that CWD changes do not invalidate the cached system prompt.
109
109
-**FR-003**: System MUST create a cache marker when total message count reaches multiples of 20 (20, 40, 60, etc.)
110
110
-**FR-004**: System MUST NOT create cache markers when total message count is below 20 or not a multiple of 20
111
111
-**FR-005**: System MUST maintain cache markers at the most recent multiple-of-20 message position (sliding window)
112
112
-**FR-006**: System MUST include cached messages in the context provided to the AI agent
113
-
-**FR-007**: System MUST not add cache_control markers when using non-Claude models
114
-
-**FR-008**: System MUST extend usage tracking to include cache-related metrics. Cache tokens are extracted from two sources with priority ordering: (1) Claude top-level fields (cache_read_input_tokens, cache_creation_input_tokens, cache_creation object) take priority, (2) OpenAI-standard prompt_tokens_details fields (cached_tokens → cache_read_input_tokens, cache_creation_input_tokens → cache_creation_input_tokens) serve as fallback for non-Claude models that return cache data via prompt_tokens_details
113
+
-**FR-007**: System MUST NOT add cache_control markers when using non-Claude models (as determined by `WAVE_PROMPT_CACHE_REGEX`). However, cache token extraction from usage (FR-008) applies to all models regardless of this gate.
114
+
-**FR-008**: System MUST extend usage tracking to include cache-related metrics for ALL models (not gated by `supportsPromptCaching`). Cache tokens are extracted from two sources with priority ordering: (1) Claude top-level fields (cache_read_input_tokens, cache_creation_input_tokens, cache_creation object) take priority, (2) OpenAI-standard prompt_tokens_details fields (cached_tokens → cache_read_input_tokens, cache_creation_input_tokens → cache_creation_input_tokens) serve as fallback for non-Claude models that return cache data via prompt_tokens_details
115
115
-**FR-009**: System MUST apply cache_control markers identically for both streaming and non-streaming requests during message preparation phase
116
116
-**FR-010**: System MUST maintain backward compatibility with existing message processing logic (except for the cache strategy itself which is a breaking change)
117
117
-**FR-011**: System MUST support caching for different message roles at interval positions, applying cache_control only at block level:
@@ -126,5 +126,5 @@ As a user who switches between permission modes (e.g., default → plan → acce
126
126
-**Cache Marker**: Represents a point in the conversation where messages are preserved for context, containing the message position and associated conversation content
127
127
-**Message Context**: Represents the combination of system prompt, tools, and cached messages that provide context for AI agent responses
128
128
-**Enhanced Usage Metrics**: Extended usage object including cache-related token counts and creation breakdown
129
-
-**Claude Model Detection**: Boolean determination based on case-insensitive model name matching
129
+
-**Claude Model Detection**: Boolean determination based on case-insensitive model name matching. Gates cache_control marker injection only — cache token extraction applies to all models.
130
130
-**Structured Message Content**: Array-based message content format supporting cache_control on individual content parts
0 commit comments