Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vertex AI prompt caching support for Claude models #961

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

aitoroses
Copy link

@aitoroses aitoroses commented Feb 12, 2025

Description

This PR adds prompt caching support for the Vertex API integration. The main changes are:

  • Caching Implementation:
    • Ephemeral cache_control fields have been added to user messages and system prompts in src/api/providers/vertex.ts when caching is supported.
    • Token usage for cache writes and cache reads is now tracked and emitted during streaming.
  • Configuration Updates:
    • The configuration in src/shared/api.ts has been updated to enable prompt caching for supported models and includes pricing details for cache writes and reads.
  • Testing Enhancements:
    • Unit tests in src/api/providers/__tests__/vertex.test.ts have been updated to simulate and verify the new caching behavior.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

How Has This Been Tested?

  • Updated unit tests in vertex.test.ts simulate prompt caching scenarios.
  • Verified that token usage for cache writes and reads is correctly output.
  • Ran the full test suite locally with all tests passing.
  • Tested the extension locally
  • Verified the cost on vertex.ai

Checklist:

  • My code follows the patterns of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

Additional context

  • This feature is only active for models with supportsPromptCache enabled.
  • The changes improve response efficiency by reusing cached prompt responses where possible.
  • Updated pricing details for caching are included in the shared API configuration.

Captura de pantalla 2025-02-12 a las 15 53 48
Captura de pantalla 2025-02-13 a las 9 10 23

Related Issues

N/A

Reviewers


Important

Add prompt caching support for Claude models in Vertex AI, updating configurations and tests to handle caching behavior.

  • Caching Implementation:
    • Add cache_control fields to user messages and system prompts in vertex.ts.
    • Track token usage for cache writes and reads during streaming.
  • Configuration Updates:
    • Enable prompt caching in vertexModels in api.ts for supported models.
    • Include pricing details for cache writes and reads.
  • Testing Enhancements:
    • Update vertex.test.ts to simulate and verify caching behavior.

This description was created by Ellipsis for 3ddd4c9. It will automatically update as commits are pushed.

Copy link

changeset-bot bot commented Feb 12, 2025

⚠️ No Changeset found

Latest commit: 0135946

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

- Keep assistant messages as plain strings
- Simplify cache control application in user messages
- Make test assertions more explicit
@aitoroses aitoroses marked this pull request as ready for review February 12, 2025 18:53
- Fix prompt caching to respect 4-block limit by only caching last text block of messages\n- Add proper handling of image content blocks in messages\n- Add comprehensive documentation about Vertex API caching limitations\n- Improve code comments for better maintainability
@aitoroses
Copy link
Author

I'm currently seeing that test are not passing in the upstream main branch, I believe that this is causing unit tests to fail here.
If I move to the main branch, they fail, the error is:

...
issues: [
          {
            code: 'invalid_type',
            expected: 'object',
            received: 'array',
            path: [],
            message: 'Expected object, received array'
          }
        ],
        addIssue: [Function (anonymous)],
        addIssues: [Function (anonymous)],
        errors: [
          {
            code: 'invalid_type',
            expected: 'object',
            received: 'array',
            path: [],
            message: 'Expected object, received array'
          }
        ]
...

If I run them on my local branch they pass!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant