Summary
CachePlatform (the ai.platform.cache.* decorator, symfony/ai-cache-platform) provides "free idempotency" for re-running platform invocations. But it can only build a cache key for string, array, and MessageBag inputs — it throws on any other input type, including the first-party ContentInterface content objects (Audio, Document, DocumentUrl, Image, ImageUrl, …).
That means it cannot decorate any platform task whose top-level input is a single content object, which is the normal shape for audio transcription, OCR, and single-image vision — exactly the slow/expensive calls most worth caching.
The gap
CachePlatform::invoke() (src/platform/src/Bridge/Cache/CachePlatform.php):
$normalizedInput = match (true) {
\is_string($input) => md5($input),
\is_array($input) => json_encode($input),
$input instanceof MessageBag => $input->getId()->toString(),
default => throw new InvalidArgumentException(\sprintf('Unsupported input type: %s', get_debug_type($input))),
};
A ContentInterface instance falls through to default and throws (only when prompt_cache_key is set; otherwise the call silently passes through uncached).
Reproduction
Both of these already exist in the codebase today:
Whisper (audio transcription) — examples/openai/audio-transcript.php invokes with an Audio content object directly:
$file = Audio::fromFile(__DIR__.'/audio.mp3');
$platform->invoke('whisper-1', $file); // <-- $file is a ContentInterface
Wrap that platform in CachePlatform and pass prompt_cache_key → InvalidArgumentException: Unsupported input type: Symfony\AI\Platform\Message\Content\Audio.
Mistral OCR (the bridge proposed in #2072) — same shape with DocumentUrl/Document:
$platform->invoke('mistral-ocr-latest', new DocumentUrl('https://…/document.pdf'), ['prompt_cache_key' => 'ocr']);
// InvalidArgumentException: Unsupported input type: …\Content\DocumentUrl
Confirmed by direct test against CachePlatform (string input caches fine; a DocumentUrl input throws).
Proposed fix (surgical)
Content objects currently have no identity — ContentInterface is an empty marker. Introduce a small opt-in interface so an input can advertise a stable cache key, and have CachePlatform honor it:
interface CacheableInputInterface
{
public function getCacheKey(): string;
}
// CachePlatform::invoke()
$normalizedInput = match (true) {
\is_string($input) => md5($input),
\is_array($input) => json_encode($input),
$input instanceof MessageBag => $input->getId()->toString(),
$input instanceof CacheableInputInterface => $input->getCacheKey(),
default => throw new InvalidArgumentException(...),
};
Then implement it on the URL/file content classes:
DocumentUrl / ImageUrl → return the URL (natural content identity).
Document / File → hash('xxh128', $this->asBinary()) (content hash; File already exposes asBinary() and __serialize()).
This generalizes to every content-object task rather than special-casing OCR, and is additive (no BC break — CacheableInputInterface is opt-in).
Note on identity semantics
MessageBag keys off getId(), which is a per-instance random Uuid::v7() — i.e. instance identity (same content, new instance → cache miss). The content classes above would instead use content identity (same URL/bytes → hit), which is what you want for idempotent re-runs of a pipeline over a document/audio archive. Worth a sentence in the docs so the two behaviors aren't surprising; happy to align with whatever the maintainers prefer.
Scope
Surgical and standalone — independent of the OCR bridge in #2072 (the gap already affects Whisper). I'm happy to send a PR if the direction is accepted.
Summary
CachePlatform(theai.platform.cache.*decorator,symfony/ai-cache-platform) provides "free idempotency" for re-running platform invocations. But it can only build a cache key forstring,array, andMessageBaginputs — it throws on any other input type, including the first-partyContentInterfacecontent objects (Audio,Document,DocumentUrl,Image,ImageUrl, …).That means it cannot decorate any platform task whose top-level input is a single content object, which is the normal shape for audio transcription, OCR, and single-image vision — exactly the slow/expensive calls most worth caching.
The gap
CachePlatform::invoke()(src/platform/src/Bridge/Cache/CachePlatform.php):A
ContentInterfaceinstance falls through todefaultand throws (only whenprompt_cache_keyis set; otherwise the call silently passes through uncached).Reproduction
Both of these already exist in the codebase today:
Whisper (audio transcription) —
examples/openai/audio-transcript.phpinvokes with anAudiocontent object directly:Wrap that platform in
CachePlatformand passprompt_cache_key→InvalidArgumentException: Unsupported input type: Symfony\AI\Platform\Message\Content\Audio.Mistral OCR (the bridge proposed in #2072) — same shape with
DocumentUrl/Document:Confirmed by direct test against
CachePlatform(string input caches fine; aDocumentUrlinput throws).Proposed fix (surgical)
Content objects currently have no identity —
ContentInterfaceis an empty marker. Introduce a small opt-in interface so an input can advertise a stable cache key, and haveCachePlatformhonor it:Then implement it on the URL/file content classes:
DocumentUrl/ImageUrl→ return the URL (natural content identity).Document/File→hash('xxh128', $this->asBinary())(content hash;Filealready exposesasBinary()and__serialize()).This generalizes to every content-object task rather than special-casing OCR, and is additive (no BC break —
CacheableInputInterfaceis opt-in).Note on identity semantics
MessageBagkeys offgetId(), which is a per-instance randomUuid::v7()— i.e. instance identity (same content, new instance → cache miss). The content classes above would instead use content identity (same URL/bytes → hit), which is what you want for idempotent re-runs of a pipeline over a document/audio archive. Worth a sentence in the docs so the two behaviors aren't surprising; happy to align with whatever the maintainers prefer.Scope
Surgical and standalone — independent of the OCR bridge in #2072 (the gap already affects Whisper). I'm happy to send a PR if the direction is accepted.