Add Vertex AI prompt caching support for Claude models #961

aitoroses · 2025-02-12T14:55:50Z

Description

This PR adds prompt caching support for the Vertex API integration. The main changes are:

Caching Implementation:
- Ephemeral cache_control fields have been added to user messages and system prompts in src/api/providers/vertex.ts when caching is supported.
- Token usage for cache writes and cache reads is now tracked and emitted during streaming.
Configuration Updates:
- The configuration in src/shared/api.ts has been updated to enable prompt caching for supported models and includes pricing details for cache writes and reads.
Testing Enhancements:
- Unit tests in src/api/providers/__tests__/vertex.test.ts have been updated to simulate and verify the new caching behavior.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Updated unit tests in vertex.test.ts simulate prompt caching scenarios.
Verified that token usage for cache writes and reads is correctly output.
Ran the full test suite locally with all tests passing.
Tested the extension locally
Verified the cost on vertex.ai

Checklist:

My code follows the patterns of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation

Additional context

This feature is only active for models with supportsPromptCache enabled.
The changes improve response efficiency by reusing cached prompt responses where possible.
Updated pricing details for caching are included in the shared API configuration.

Related Issues

N/A

Reviewers

Important

Add prompt caching support for Claude models in Vertex AI, updating configurations and tests to handle caching behavior.

Caching Implementation:
- Add cache_control fields to user messages and system prompts in vertex.ts.
- Track token usage for cache writes and reads during streaming.
Configuration Updates:
- Enable prompt caching in vertexModels in api.ts for supported models.
- Include pricing details for cache writes and reads.
Testing Enhancements:
- Update vertex.test.ts to simulate and verify caching behavior.

^{This description was created by}^{for 3ddd4c9. It will automatically update as commits are pushed.}

changeset-bot · 2025-02-12T14:55:56Z

⚠️ No Changeset found

Latest commit: ea38d9e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

aitoroses · 2025-02-14T09:47:43Z

I'm currently seeing that test are not passing in the upstream main branch, I believe that this is causing unit tests to fail here.
If I move to the main branch, they fail, the error is:

...
issues: [
          {
            code: 'invalid_type',
            expected: 'object',
            received: 'array',
            path: [],
            message: 'Expected object, received array'
          }
        ],
        addIssue: [Function (anonymous)],
        addIssues: [Function (anonymous)],
        errors: [
          {
            code: 'invalid_type',
            expected: 'object',
            received: 'array',
            path: [],
            message: 'Expected object, received array'
          }
        ]
...

If I run them on my local branch they pass!

lupuletic · 2025-02-24T22:03:09Z

Is there a plan to get this merged @mrubens / @cte ?

Was looking at creating a separate PR for prompt caching for Vertex AI, but then found that this already exists

Thaanks!

mrubens · 2025-02-24T22:05:24Z

Is there a plan to get this merged @mrubens / @cte ?

Was looking at creating a separate PR for prompt caching for Vertex AI, but then found that this already exists

Thaanks!

Oh, I completely missed this one! I don't personally use vertex. @lupuletic if you or anyone has a minute to review I'm happy to ship it.

mrubens · 2025-02-24T22:05:45Z

Thank you for opening this @aitoroses and very sorry I missed it!

mrubens · 2025-02-24T22:06:33Z

Tests should be less flaky if you merge in main

lupuletic · 2025-02-24T23:06:07Z

src/api/providers/vertex.ts

+							// 2. Cache the most relevant context (usually at the end of the message)
+							const isLastTextBlock =
+								contentIndex ===
+								array.reduce((lastIndex, c, i) => (c.type === "text" ? i : lastIndex), -1)


For each item in the array, we seem to be running a reduce operation on the entire array to find the index of the last text block. This means if there are N items in the array, we're doing N full array traversals, which is O(N²) complexity.

// Current implementation (lines 118-132) message.content.map((content, contentIndex, array) => { // Images and other non-text content are passed through unchanged if (content.type === "image") { return content as VertexImageBlock } // We only cache the last text block in each message to: // 1. Stay under the 4-block cache limit // 2. Cache the most relevant context (usually at the end of the message) const isLastTextBlock = contentIndex === array.reduce((lastIndex, c, i) => (c.type === "text" ? i : lastIndex), -1) return { type: "text" as const, text: (content as { text: string }).text, ...(shouldCache && isLastTextBlock && { cache_control: { type: "ephemeral" } }), } })

We can optimize this by calculating the last text block index once before the map operation, reducing the complexity to O(N):

private formatMessageForCache(message: Anthropic.Messages.MessageParam, shouldCache: boolean): VertexMessage { // Assistant messages are kept as-is since they can't be cached if (message.role === "assistant") { return message as VertexMessage } // For string content, we convert to array format with optional cache control if (typeof message.content === "string") { return { ...message, content: [ { type: "text" as const, text: message.content, // For string content, we only have one block so it's always the last ...(shouldCache && { cache_control: { type: "ephemeral" } }), }, ], } } // For array content, find the last text block index once before mapping const lastTextBlockIndex = message.content.reduce( (lastIndex: number, content: Anthropic.Messages.ContentBlock, index: number) => (content.type === "text" ? index : lastIndex), -1 ) // Then use this pre-calculated index in the map function return { ...message, content: message.content.map((content: Anthropic.Messages.ContentBlock, contentIndex: number) => { // Images and other non-text content are passed through unchanged if (content.type === "image") { return content as VertexImageBlock } // Check if this is the last text block using our pre-calculated index const isLastTextBlock = contentIndex === lastTextBlockIndex return { type: "text" as const, text: (content as Anthropic.Messages.TextBlock).text, ...(shouldCache && isLastTextBlock && { cache_control: { type: "ephemeral" } }), } }), } }

Let me try this!

thanks -- that seems to be working as expected!

lupuletic · 2025-02-24T23:16:17Z

src/api/providers/vertex.ts

+
+		// Find indices of user messages that we want to cache
+		// We only cache the last two user messages to stay within the 4-block limit
+		// (1 block for system + 1 block each for last two user messages = 3 total)


Was wondering if we can reduce costs further by making use of the 4th cache block, but then I noticed openRouter implementation does the same thing.

However, I don't have a good enough understanding to make a suggestion towards what additionally could be beneficial to cache. Therefore, just leaving this here as more of a question on whether there's additional caching opportunities or not (could indeed increase costs if we just write and don't read)

Hi @lupuletic! I've tried originally using one more but it gave errors so.. it just considers 3 + 1 (system), but it's sufficient to cache the big system prompt!

- Implemented comprehensive prompt caching strategy for Vertex AI models - Added support for caching system prompts and user message text blocks - Enhanced stream processing to handle cache-related usage metrics - Updated model configurations to enable prompt caching - Improved type definitions for Vertex AI message handling

lupuletic · 2025-02-25T20:00:28Z

src/shared/api.ts

@@ -435,41 +435,51 @@ export const vertexModels = {
 		contextWindow: 200_000,
 		supportsImages: true,
 		supportsComputerUse: true,
-		supportsPromptCache: false,
+		supportsPromptCache: true,


can we please enable this for claude sonnet 3.7 as well? just few lines above

otherwise, LGTM and thanks a lot for implementing this one @aitoroses !

I've just added caching for 3.7! @mrubens / @cte

…caching

mrubens · 2025-02-27T13:24:35Z

Awesome! Will look today

lupuletic · 2025-02-27T16:28:29Z

src/shared/api.ts

@@ -441,7 +441,7 @@ export const vertexModels = {
 		contextWindow: 200_000,
 		supportsImages: true,
 		supportsComputerUse: true,
-		supportsPromptCache: false,
+		supportsPromptCache: true,
 		inputPrice: 3.0,
 		outputPrice: 15.0,


it also needs the cache writes and reads price (they are the same ones as sonnet v2)

lupuletic · 2025-02-27T16:53:55Z

Awesome! Will look today

Thanks a lot -- I am using Vertex AI quite a bit and currently I'm on a local installation of RooCode to benefit from the caching cost savings. However, I am sure this will benefit more people, so keen to get it this released!

If preferred, I am also happy to spin up a new PR with all the new changes needed

Thanks again all!

mrubens

Code looks good to me! @lupuletic which changes do you need pulled in? I can also just merge if you want to fast follow.

lupuletic · 2025-02-27T18:58:33Z

Code looks good to me! @lupuletic which changes do you need pulled in? I can also just merge if you want to fast follow.

Yeep, looks good to me too! It's just the cache costs for 3.7 missing, which I added on a follow-up PR here: #1244

lupuletic

LGTM

cte · 2025-02-27T20:03:26Z

Thanks all!

…c#961)

aitoroses marked this pull request as ready for review February 12, 2025 18:53

aitoroses requested review from stea9499, ColemanRoo, mrubens and cte as code owners February 12, 2025 18:53

lupuletic reviewed Feb 24, 2025

View reviewed changes

aitoroses force-pushed the feat/vertex-prompt-caching branch from 0135946 to 9b267e9 Compare February 25, 2025 14:30

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 25, 2025

lupuletic reviewed Feb 25, 2025

View reviewed changes

aitoroses added 2 commits February 27, 2025 08:55

Merge remote-tracking branch 'upstream/main' into feat/vertex-prompt-…

4d22ae5

…caching

Enable prompt caching for Claude Sonnet 3.7 Vertex AI model

ea38d9e

lupuletic reviewed Feb 27, 2025

View reviewed changes

mrubens approved these changes Feb 27, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 27, 2025

lupuletic approved these changes Feb 27, 2025

View reviewed changes

cte merged commit 02c955e into RooCodeInc:main Feb 27, 2025
11 checks passed

refactorthis pushed a commit to refactorthis/Roo-Code that referenced this pull request Mar 2, 2025

Add gemini-2.0-flash-thinking-exp-1219 and gemini-exp-1206 (RooCodeIn…

b932531

…c#961)

Add Vertex AI prompt caching support for Claude models #961

Add Vertex AI prompt caching support for Claude models #961

Uh oh!

Conversation

aitoroses commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

How Has This Been Tested?

Checklist:

Additional context

Related Issues

Reviewers

Uh oh!

changeset-bot bot commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

aitoroses commented Feb 14, 2025

Uh oh!

lupuletic commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrubens commented Feb 24, 2025

Uh oh!

mrubens commented Feb 24, 2025

Uh oh!

mrubens commented Feb 24, 2025

Uh oh!

lupuletic Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aitoroses Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

lupuletic Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

lupuletic Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

aitoroses Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

lupuletic Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

aitoroses Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

mrubens commented Feb 27, 2025

Uh oh!

lupuletic Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

lupuletic commented Feb 27, 2025

Uh oh!

mrubens left a comment

Choose a reason for hiding this comment

Uh oh!

lupuletic commented Feb 27, 2025

Uh oh!

lupuletic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cte commented Feb 27, 2025

Uh oh!

Uh oh!

aitoroses commented Feb 12, 2025 •

edited

Loading

changeset-bot bot commented Feb 12, 2025 •

edited

Loading

lupuletic commented Feb 24, 2025 •

edited

Loading

lupuletic Feb 24, 2025 •

edited

Loading