Commit 35e3e17
feat(bedrock): add prompt caching via CachePoint markers (#1940)
feat(bedrock): add prompt caching via CachePoint markers
Closes #1871.
## Why this is needed
Multi-turn tool-using kagent agents on Bedrock pay full input-token cost
on every Converse call, because the static prefix (system prompt + tool
definitions) is re-sent and re-billed each turn. Real measurement from a
production deployment using Claude Sonnet 4.5 via the `us.` inference
profile in us-east-1, running every 2 hours against a ~700-pod EKS
cluster:
- Per sweep: ~4 Converse calls
- Cumulative input tokens (CloudWatch InvokeModel metric): ~313k
- Cumulative output tokens: ~3k
- Per-sweep cost: ~$0.98 (input dominates ~95%)
- Per cluster/year (5 sweeps/weekday): ~$1,300
- Per cluster/year (24/7 every 2h): projected ~$4-9k
~30k of the per-call input is identical across every call — system
prompt and tool definitions don't change inside a single task. Bedrock
prompt caching is designed precisely for this case: a `cachePoint` block
in the Converse request marks where the cacheable prefix ends, and
subsequent calls within ~5 minutes (per region) hit the cache and bill
the prefix at a reduced rate.
The Bedrock provider builds Converse requests using
`system: [textBlock]` and `toolConfig.tools: [...]` but never appends a
`cachePoint` block to either array, so caching is never engaged.
## Why this is not redundant with the existing
`spec.declarative.compaction`
kagent already has an Agent-level context-compaction feature
(`Compaction`, `CompactionInterval`, `Summarizer`, `TokenThreshold`,
etc.) that summarizes old conversation turns when the conversation
exceeds a token threshold. That solves a different problem:
- Compaction: shrinks the conversation prompt when it gets too long.
Helps with context-window pressure on long-running agents.
- Prompt caching: keeps the prompt the same size but tells Bedrock
"the first N tokens are stable across calls, cache them and bill
cached portion at the reduced rate."
Neither replaces the other. For a tool-using agent whose conversation
stays under the context limit but whose static prefix (system prompt +
tool defs) is large, prompt caching is the right hammer; compaction does
nothing because there's nothing to compact in the static prefix.
## What this PR does
Adds a `promptCaching: bool` field to `BedrockConfig` (defaulting to
`false` to preserve existing behavior). When set, the provider appends
a `CachePoint` block:
1. To the end of the `system` content array (after the system text
block)
2. To the end of the `toolConfig.tools` array (after the last ToolSpec)
Markers use `CachePointTypeDefault`. Bedrock silently ignores cache
points on models that don't support prompt caching, so the field is
safe to enable on a heterogeneous model fleet without per-model
gating.
Tested against `us.anthropic.claude-sonnet-4-5-20250929-v1:0`: the
second and subsequent Converse calls within the cache window drop their
input-token billing by ~70-90% on cache hits, depending on which static
portion (system vs tools vs both) is being hit.
## Implementation surface
Mirrors the change across both runtimes — Go (for `runtime: go` agents)
and Python (for `runtime: python` agents):
Go:
- `go/api/v1alpha2/modelconfig_types.go`: add `PromptCaching` to
`BedrockConfig` CRD struct with full doc + kubebuilder default.
- `go/api/adk/types.go`: add `PromptCaching` to the internal
`adk.Bedrock` serializable model so it flows through agent config JSON.
- `go/core/internal/controller/translator/agent/adk_api_translator.go`:
populate the new field when translating ModelConfig CR -> adk.Bedrock.
- `go/adk/pkg/agent/agent.go`: thread the value into
`models.BedrockConfig`.
- `go/adk/pkg/models/bedrock.go`: emit the cache point markers in the
Converse request builders.
Python:
- `python/.../adk/types.py`: add `prompt_caching: bool` to the
`Bedrock` Pydantic model and pass through to `KAgentBedrockLlm` factory.
- `python/.../adk/models/_bedrock.py`: append
`{"cachePoint": {"type": "default"}}` to `kwargs["system"]` and
`kwargs["toolConfig"]["tools"]` when enabled.
Regenerated CRDs via `make controller-manifests` so
`helm/kagent-crds/templates/kagent.dev_modelconfigs.yaml` reflects the
new schema field.
## Tests
`go/adk/pkg/models/bedrock_test.go`: new
`TestConvertGenaiToolsToBedrockPromptCaching` covering three cases:
disabled = no marker, enabled = marker appended at END of tool list
with default type, enabled-but-no-tools = no marker (no point in a
standalone marker).
Existing `convertGenaiToolsToBedrock` callers updated to pass the new
`promptCaching bool` argument as `false` (no behavior change).
## Backward compatibility
- `promptCaching` defaults to `false` everywhere; existing
ModelConfigs pick up no behavior change.
- Serialized `adk.Bedrock` JSON uses `omitempty` for the new field;
older agent pods deserializing newer config see an unknown field
they ignore (Pydantic + Go json decoders both lenient by default).
- The Converse API tolerates and ignores `CachePoint` markers on
models that don't support caching, so enabling on a mixed-model
setup is safe.
## Example usage
```yaml
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
name: bedrock-claude
spec:
provider: Bedrock
model: us.anthropic.claude-sonnet-4-5-20250929-v1:0
bedrock:
region: us-east-1
promptCaching: true
```
Signed-off-by: Shamil Kashmeri <shamil@viafoura.com>
---------
Signed-off-by: Shamil Kashmeri <shamil@viafoura.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>1 parent 9afbe4e commit 35e3e17
11 files changed
Lines changed: 400 additions & 6 deletions
File tree
- go
- adk/pkg
- agent
- models
- api
- adk
- config/crd/bases
- v1alpha2
- core/internal/controller/translator/agent
- helm/kagent-crds/templates
- python/packages/kagent-adk
- src/kagent/adk
- models
- tests/unittests/models
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
304 | 304 | | |
305 | 305 | | |
306 | 306 | | |
| 307 | + | |
| 308 | + | |
307 | 309 | | |
308 | 310 | | |
309 | 311 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
80 | 107 | | |
81 | 108 | | |
82 | 109 | | |
| |||
151 | 178 | | |
152 | 179 | | |
153 | 180 | | |
154 | | - | |
| 181 | + | |
155 | 182 | | |
156 | 183 | | |
157 | 184 | | |
| |||
182 | 209 | | |
183 | 210 | | |
184 | 211 | | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
185 | 222 | | |
186 | 223 | | |
187 | 224 | | |
| |||
654 | 691 | | |
655 | 692 | | |
656 | 693 | | |
657 | | - | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
658 | 701 | | |
659 | 702 | | |
660 | 703 | | |
| |||
711 | 754 | | |
712 | 755 | | |
713 | 756 | | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
714 | 768 | | |
715 | 769 | | |
716 | 770 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
201 | | - | |
| 201 | + | |
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
| |||
226 | 226 | | |
227 | 227 | | |
228 | 228 | | |
229 | | - | |
| 229 | + | |
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
250 | | - | |
| 250 | + | |
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| |||
402 | 402 | | |
403 | 403 | | |
404 | 404 | | |
405 | | - | |
| 405 | + | |
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
| |||
672 | 672 | | |
673 | 673 | | |
674 | 674 | | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
254 | 263 | | |
255 | 264 | | |
256 | 265 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
483 | 483 | | |
484 | 484 | | |
485 | 485 | | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
486 | 524 | | |
487 | 525 | | |
488 | 526 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
256 | 256 | | |
257 | 257 | | |
258 | 258 | | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
259 | 295 | | |
260 | 296 | | |
261 | 297 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
806 | 806 | | |
807 | 807 | | |
808 | 808 | | |
| 809 | + | |
| 810 | + | |
809 | 811 | | |
810 | 812 | | |
811 | 813 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
483 | 483 | | |
484 | 484 | | |
485 | 485 | | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
486 | 524 | | |
487 | 525 | | |
488 | 526 | | |
| |||
0 commit comments