Skip to content

Commit e246f07

Browse files
committed
docs: clarify supportsPromptCaching gates marker injection only, not cache token extraction
Update spec 021 to explicitly distinguish two concerns: - supportsPromptCaching / WAVE_PROMPT_CACHE_REGEX gates cache_control marker injection (messages/tools) — Claude-only - Cache token extraction from usage applies to ALL models — no gate Updated: spec.md (FR-001, FR-007, FR-008, edge cases, key entities), data-model.md (relationships, model detection flow), quickstart.md (phase 3 integration notes), contracts/cache-control-api.md (scope note), research.md (universal caching alternatives)
1 parent ad8fc80 commit e246f07

5 files changed

Lines changed: 20 additions & 12 deletions

File tree

specs/021-prompt-cache-control/contracts/cache-control-api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ const isClaudeModel: typeof supportsPromptCaching;
121121
- Example: `WAVE_PROMPT_CACHE_REGEX="claude|qwen"` matches both claude and qwen models
122122
- Invalid regex patterns fall back to simple "claude" matching
123123

124-
**Note**: While `supportsPromptCaching` controls cache_control marker injection (only for Claude-like models), cache token extraction from usage applies to ALL models. Non-Claude models (Gemini, DeepSeek) return cache data via `prompt_tokens_details` which is an OpenAI-standard field.
124+
**Scope of `supportsPromptCaching`**: This function gates ONLY cache_control marker injection (adding `cache_control: {type: "ephemeral"}` to messages and tool definitions). It does NOT gate cache token extraction from usage responses — that applies to ALL models, since `prompt_tokens_details` is an OpenAI-standard field returned by Gemini, DeepSeek, and others.
125125

126126
### Cache Control Application
127127

specs/021-prompt-cache-control/data-model.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,9 +106,9 @@ interface ModelCacheConfig {
106106
- `"claude|qwen3\\.6-plus"` - matches "claude" or exact "qwen3.6-plus"
107107

108108
**Relationships**:
109-
- Determines cache control application for entire request
109+
- Determines cache_control marker injection for the request (messages and tools)
110+
- Does NOT gate cache token extraction from usage — that applies to all models
110111
- Affects message transformation and tool processing
111-
- Influences usage tracking structure
112112

113113
**Legacy Support**:
114114
- `isClaudeModel()` is deprecated alias for `supportsPromptCaching()`
@@ -173,7 +173,10 @@ Request 2+: cache_creation_input_tokens = 0, cache_read_input_tokens > 0 (cache
173173
### Model Detection Flow
174174

175175
```
176-
ModelName -> isClaudeModel() -> shouldApplyCache -> Message Transformation
176+
ModelName -> supportsPromptCaching() -> shouldInjectCacheControlMarkers -> Message Transformation
177+
178+
Note: Cache token extraction from usage is NOT gated by supportsPromptCaching.
179+
All models' usage responses are checked for cache tokens (Claude top-level + prompt_tokens_details).
177180
```
178181

179182
## Data Flow Patterns

specs/021-prompt-cache-control/quickstart.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,12 @@ export interface CacheControl {
7171

7272
**Integration Point**: After line 183 (`openaiMessages` construction)
7373

74+
> **Important**: Cache control has two distinct concerns:
75+
> 1. **Marker injection** (adding `cache_control: {type: "ephemeral"}` to messages/tools) — gated by `supportsPromptCaching`, only for Claude-like models
76+
> 2. **Cache token extraction** (reading cache metrics from usage responses) — NOT gated, applies to all models
77+
7478
```typescript
75-
// Add before createParams construction (line ~195)
79+
// Marker injection: gated by supportsPromptCaching
7680
if (supportsPromptCaching(model || modelConfig.model)) {
7781
openaiMessages = transformMessagesForClaudeCache(
7882
openaiMessages,
@@ -86,7 +90,7 @@ if (supportsPromptCaching(model || modelConfig.model)) {
8690
}
8791
```
8892

89-
**Usage Extension**: Modify usage processing to extract cache metrics from all models
93+
**Usage Extension**: Cache token extraction applies to all models (no gate)
9094

9195
```typescript
9296
// Extend usage object with cache metrics (Claude top-level + OpenAI prompt_tokens_details)

specs/021-prompt-cache-control/research.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030

3131
**Alternatives considered**:
3232
- Custom cache duration: Not supported by Claude API
33-
- Universal caching: Rejected due to cost and limited benefit for non-Claude models
33+
- Universal cache_control marker injection: Rejected due to cost and limited benefit for non-Claude models (only Claude API supports `cache_control: {type: "ephemeral"}` markers)
34+
- Universal cache token extraction: Accepted — `prompt_tokens_details` is an OpenAI-standard field; extracting cache metrics from all models' usage is zero-cost and provides visibility
3435
- Image content caching: Not supported by Claude API
3536

3637
## Implementation Architecture Research

specs/021-prompt-cache-control/spec.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ As a user who switches between permission modes (e.g., default → plan → acce
9393

9494
### Edge Cases
9595

96-
- **Edge Case 1**: Model name detection MUST be case-insensitive ("Claude-3-Sonnet" and "claude-3-sonnet" both trigger caching)
96+
- **Edge Case 1**: Model name detection MUST be case-insensitive ("Claude-3-Sonnet" and "claude-3-sonnet" both trigger cache_control marker injection). Cache token extraction from usage applies regardless of model name.
9797
- **Edge Case 2**: Mixed content messages MUST apply cache_control only to text content parts, preserving images unchanged
9898
- **Edge Case 3**: Empty conversation history MUST skip user message caching, apply system message caching only
9999
- **Edge Case 4**: Streaming and non-streaming requests MUST apply identical cache_control transformation logic
@@ -104,14 +104,14 @@ As a user who switches between permission modes (e.g., default → plan → acce
104104

105105
### Functional Requirements
106106

107-
- **FR-001**: System MUST detect cache-supporting models using the `WAVE_PROMPT_CACHE_REGEX` environment variable (default: "claude"), which allows configurable regex patterns for model matching
107+
- **FR-001**: System MUST detect cache-supporting models for cache_control marker injection using the `WAVE_PROMPT_CACHE_REGEX` environment variable (default: "claude"), which allows configurable regex patterns for model matching. This gate controls ONLY the injection of `cache_control: {type: "ephemeral"}` markers into messages and tool definitions — it does NOT gate cache token extraction from usage responses, which applies to all models.
108108
- **FR-002**: System MUST add cache_control markers with type "ephemeral" to the first system message when using Claude models. This ensures core instructions are always cached even if reminders are added later. The system prompt MUST remain constant across plan mode transitions — plan mode instructions are injected as `<system-reminder>` user messages rather than system prompt changes to preserve the cached system prompt prefix. The `<env>` section's `Primary working directory` field MUST use the immutable `originalWorkdir` (set once at session start) rather than the dynamic `workdir` (which tracks `cd` changes), so that CWD changes do not invalidate the cached system prompt.
109109
- **FR-003**: System MUST create a cache marker when total message count reaches multiples of 20 (20, 40, 60, etc.)
110110
- **FR-004**: System MUST NOT create cache markers when total message count is below 20 or not a multiple of 20
111111
- **FR-005**: System MUST maintain cache markers at the most recent multiple-of-20 message position (sliding window)
112112
- **FR-006**: System MUST include cached messages in the context provided to the AI agent
113-
- **FR-007**: System MUST not add cache_control markers when using non-Claude models
114-
- **FR-008**: System MUST extend usage tracking to include cache-related metrics. Cache tokens are extracted from two sources with priority ordering: (1) Claude top-level fields (cache_read_input_tokens, cache_creation_input_tokens, cache_creation object) take priority, (2) OpenAI-standard prompt_tokens_details fields (cached_tokens → cache_read_input_tokens, cache_creation_input_tokens → cache_creation_input_tokens) serve as fallback for non-Claude models that return cache data via prompt_tokens_details
113+
- **FR-007**: System MUST NOT add cache_control markers when using non-Claude models (as determined by `WAVE_PROMPT_CACHE_REGEX`). However, cache token extraction from usage (FR-008) applies to all models regardless of this gate.
114+
- **FR-008**: System MUST extend usage tracking to include cache-related metrics for ALL models (not gated by `supportsPromptCaching`). Cache tokens are extracted from two sources with priority ordering: (1) Claude top-level fields (cache_read_input_tokens, cache_creation_input_tokens, cache_creation object) take priority, (2) OpenAI-standard prompt_tokens_details fields (cached_tokens → cache_read_input_tokens, cache_creation_input_tokens → cache_creation_input_tokens) serve as fallback for non-Claude models that return cache data via prompt_tokens_details
115115
- **FR-009**: System MUST apply cache_control markers identically for both streaming and non-streaming requests during message preparation phase
116116
- **FR-010**: System MUST maintain backward compatibility with existing message processing logic (except for the cache strategy itself which is a breaking change)
117117
- **FR-011**: System MUST support caching for different message roles at interval positions, applying cache_control only at block level:
@@ -126,5 +126,5 @@ As a user who switches between permission modes (e.g., default → plan → acce
126126
- **Cache Marker**: Represents a point in the conversation where messages are preserved for context, containing the message position and associated conversation content
127127
- **Message Context**: Represents the combination of system prompt, tools, and cached messages that provide context for AI agent responses
128128
- **Enhanced Usage Metrics**: Extended usage object including cache-related token counts and creation breakdown
129-
- **Claude Model Detection**: Boolean determination based on case-insensitive model name matching
129+
- **Claude Model Detection**: Boolean determination based on case-insensitive model name matching. Gates cache_control marker injection only — cache token extraction applies to all models.
130130
- **Structured Message Content**: Array-based message content format supporting cache_control on individual content parts

0 commit comments

Comments
 (0)