ai-proxy and ai-proxy-advanced: Gemini driver drops cachedContent field during OpenAI-to-Gemini transformation

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Kong version (`$ kong version`)

Kong 3.13.x (ai-proxy-advanced plugin)

### Current Behavior


When sending a request to the `llm/v1/chat` route with `provider: gemini`, the `cachedContent` field in the request body is silently dropped during the OpenAI-to-Gemini transformation. The request succeeds, but the Vertex AI context cache is not used.

The response shows `prompt_tokens` reflecting only the user message (e.g. 8 tokens), with no `cachedContentTokenCount` — confirming the cache was ignored.


### Expected Behavior


The `cachedContent` field should be preserved through the OpenAI-to-Gemini transformation and included in the outgoing request to Vertex AI, so that users can leverage [Vertex AI context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-create) from the OpenAI-compatible `/v1/chat/completions` route.

The response should include `cachedContentTokenCount` in `usageMetadata`, and `promptTokenCount` should reflect the cached tokens plus the user message.

### Steps To Reproduce


1. Create a Vertex AI context cache via the `cachedContents` API with a large system prompt (32,768+ tokens)
2. Configure ai-proxy-advanced with `provider: gemini` and `route_type: llm/v1/chat`
3. Send a request to `/v1/chat/completions` with `cachedContent` in the body (e.g. via `extra_body` in the OpenAI SDK):

```json
{
  "model": "gemini-2.0-flash-001",
  "cachedContent": "projects/123456789/locations/us-central1/cachedContents/987654321",
  "messages": [
    {"role": "user", "content": "What company are you an expert on?"}
  ]
}
```

4. Observe that the response `usage.prompt_tokens` only reflects the user message size (e.g. 8), not the cached content. The cache is not being used.

5. For comparison, send the same request via the native `generateContent` endpoint (with `llm_format: gemini`, which bypasses transformation) — this works correctly and returns `cachedContentTokenCount` in the response.


### Anything else?


**Root cause** is in [`kong/llm/drivers/gemini.lua` lines 272-383](https://github.com/Kong/kong/blob/master/kong/llm/drivers/gemini.lua#L272-L383). The `to_gemini_chat_openai` function builds a new request body from scratch:

```lua
local function to_gemini_chat_openai(request_table, model_info, route_type)
  local new_r = {}
  -- only populates: new_r.contents, new_r.systemInstruction,
  -- new_r.generationConfig, new_r.tools, new_r.tool_config
  return new_r, "application/json", nil
end
```

`request_table.cachedContent` is never read or assigned to `new_r`.

**Suggested fix** — add one line before the return:

```lua
new_r.cachedContent = request_table.cachedContent
```

Context caching is a first-class Vertex AI feature for reducing latency and cost on large static prompts. Supporting it on the OpenAI-compatible route would let users access it without switching to the native Gemini API format.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-proxy and ai-proxy-advanced: Gemini driver drops cachedContent field during OpenAI-to-Gemini transformation #14837

Is there an existing issue for this?

Kong version (`$ kong version`)

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ai-proxy and ai-proxy-advanced: Gemini driver drops cachedContent field during OpenAI-to-Gemini transformation #14837

Description

Is there an existing issue for this?

Kong version ($ kong version)

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Kong version (`$ kong version`)