feat(ai): support for server-side compaction#17746
Conversation
Provider-agnostic infrastructure for server-side compaction: - ai-core: `serverSideCompactionSupport` capability on model metadata (propagated to the frontend); `CompactionSettings` carried verbatim on the request; `resolveCompactionDefault` (global preference folded with the per-provider override) and the capability-gated `resolveServerSideCompaction`; the global `ai-features.serverSideCompaction` preference; opaque `CompactionResponsePart` / `CompactionMessage` marker types. - ai-chat: persisted `CompactionChatResponseContent` (+ deserializer), agent stream-to-content mapping, per-session `commonSettings.compaction` copied verbatim onto the request (the agent reads no preference). - ai-chat-ui: inline compaction marker renderer, token-usage tooltip (cumulative usage + "compacted Nx"), and the per-session tri-state control. - ai-ide / ai-copilot / ai-ollama: tolerate the compaction marker (ignore foreign-provider markers in Chat Completions / Ollama conversion). Signed-off-by: Christian W. Damus <cdamus@eclipsesource.com>
Declare the capability (= useResponseApi), fold the global and per-provider preferences into the model's default enablement, enable `context_management` compaction on Responses requests when active, capture the streamed compaction item, and replay it via transcript prefix-drop. Chat Completions ignores it. Signed-off-by: Christian W. Damus <cdamus@eclipsesource.com>
Declare the capability (Opus/Sonnet 4.6+ heuristic), fold the global and per-provider preferences into the model's default enablement, route active requests through the Beta Messages API with the compact-2026-01-12 beta and the compact_20260112 edit, capture the streamed compaction block, and replay it while keeping surrounding history. Default path unchanged. Signed-off-by: Christian W. Damus <cdamus@eclipsesource.com>
eneufeld
left a comment
There was a problem hiding this comment.
We should add some follow up to allow to change the thresholds.
Other than that I have some nitpicks and we should merge this
| public serverSideCompactionEnabledByDefault: boolean = false | ||
| ) { } | ||
|
|
||
| get serverSideCompactionSupport(): boolean { |
There was a problem hiding this comment.
we should move this to packages/ai-anthropic/src/node/anthropic-language-models-manager-impl.ts#122 resolveMetadata
there we might want to also use:
https://platform.claude.com/docs/en/api/typescript/beta/models/list
but reading the support for the models endpoint can be done in a follow up
| } | ||
| const betaParams = params as T & Anthropic.Beta.Messages.MessageCreateParams; | ||
| betaParams.betas = ['compact-2026-01-12']; | ||
| betaParams.context_management = { edits: [{ type: 'compact_20260112' }] }; |
There was a problem hiding this comment.
this defaults to 150k.
In a follow up we should make this configurable
| public serverSideCompactionEnabledByDefault: boolean = false | ||
| ) { } | ||
|
|
||
| get serverSideCompactionSupport(): boolean { |
There was a problem hiding this comment.
same as for anthropic should be in the models-manager-impl
| export const PREFERENCE_NAME_MAX_RETRIES = 'ai-features.modelSettings.maxRetries'; | ||
| export const PREFERENCE_NAME_DEFAULT_NOTIFICATION_TYPE = 'ai-features.notifications.default'; | ||
| export const PREFERENCE_NAME_SKILL_DIRECTORIES = 'ai-features.skills.skillDirectories'; | ||
| export const PREFERENCE_NAME_SERVER_SIDE_COMPACTION = 'ai-features.chat.serverSideCompaction'; |
There was a problem hiding this comment.
This preference is added to the schema below but not to AICoreConfiguration, so unlike its siblings it can't be read through the AICorePreferences proxy. Add [PREFERENCE_NAME_SERVER_SIDE_COMPACTION]: boolean | undefined; to that interface so access stays type-safe and consistent.
What it does
Fixes #17636.
Adds provider-native (server-side) compaction for AI chat sessions (#17636). When a conversation grows past a supporting model's context limit, the provider summarizes older turns on its side so the session keeps working, instead of failing or forcing the user to start over or trim history manually.
Activation is layered, from broad to specific:
Whether compaction is available is a model capability: it is currently honored by Anthropic and by OpenAI (Responses API), and providers or models without the capability simply ignore the setting. When compaction occurs it is shown inline in the chat and summarized in the token-usage tooltip (cumulative usage plus a "compacted N×" count), persisted with the session, and replayed on subsequent requests.
How to test
Follow-ups
Breaking changes
Attribution
Review checklist
nlsservice (for details, please see the Internationalization/Localizationsection in the Coding Guidelines)
Reminder for reviewers