KG-596 Fix thought signature on tool call with gemini 3.0 (#1317)

mltheuser · Malte Heuser · web-flow · commit 48a66d99e021 · 2026-01-06T18:24:28.000+02:00
## Motivation and Context Related to: [KG-596](https://youtrack.jetbrains.com/projects/KG/issues/KG-596) Gemini 3.0 models enforce stricter validation of [thoughtSignature](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#401-429) for function calls. When the model returns parallel tool calls, only the *first* call in a turn receives a [thoughtSignature](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#401-429). On subsequent turns, the API expects this signature to be echoed back exactly. Without proper handling, multi-turn agentic conversations with parallel tools fail with cryptic API errors. **The Problem:** The [GoogleLLMClient](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClient.kt#101-844) wasn't preserving [thoughtSignature](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#401-429) across turns, and wasn't correctly re-grouping parallel tool calls/results when constructing requests—leading to malformed conversation structures that newer Gemini models reject. ## How It's Solved 1. **Preserve [thoughtSignature](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#401-429):** When processing model responses, we now extract the [thoughtSignature](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#401-429) from `GooglePart.FunctionCall` and store it in `Message.Tool.Call.metaInfo.metadata`. When building subsequent requests, we restore it. 2. **Correctly batch parallel tool calls/results:** The [createGoogleRequest](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClient.kt#280-477) function now uses a buffering strategy to re-group interleaved messages. The key insight: if a tool call has a [thoughtSignature](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#401-429), it starts a new turn; if it doesn't, it's a parallel call in the same turn. This lets us batch calls into a single [model](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClient.kt#775-805) role [GoogleContent](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/models/GoogleGenerateContent.kt#48-53) and results into a single [user](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#59-74) role [GoogleContent](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/models/GoogleGenerateContent.kt#48-53), as the API requires. 3. **Clean, idiomatic implementation:** The buffering logic uses a [when](file:///Users/ku76uh/Developer/jetbrains/fork/koog/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt#59-74) expression to make the three states explicit (new turn, starting fresh, parallel call), keeping the code readable and maintainable. ## Breaking Changes None. This is a backward-compatible fix that enables correct behavior with Gemini 3.0+ while remaining compatible with earlier models. --- #### Type of the changes - [ ] New feature (non-breaking change which adds functionality) - [x] Bug fix (non-breaking change which fixes an issue) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Tests improvement - [ ] Refactoring #### Checklist - [x] The pull request has a description of the proposed change - [x] I read the [Contributing Guidelines](https://github.com/JetBrains/koog/blob/main/CONTRIBUTING.md) before opening the pull request - [x] The pull request uses **`develop`** as the base branch - [x] Tests for the changes have been added - [x] All new and existing tests passed ##### Additional steps for pull requests adding a new feature - [x] An issue describing the proposed change exists - [x] The pull request includes a link to the issue - [ ] The change was discussed and approved in the issue - [ ] Docs have been added / updated --------- Co-authored-by: Malte Heuser <malte.heuser@ing.com>
diff --git a/integration-tests/src/jvmTest/kotlin/ai/koog/integration/tests/utils/Models.kt b/integration-tests/src/jvmTest/kotlin/ai/koog/integration/tests/utils/Models.kt
@@ -35,6 +35,7 @@ object Models {
     @JvmStatic
     fun googleModels(): Stream<LLModel> {
         return Stream.of(
+            GoogleModels.Gemini3_Pro_Preview,
             GoogleModels.Gemini2_5Pro,
             GoogleModels.Gemini2_5Flash,
         )
diff --git a/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClient.kt b/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/commonMain/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClient.kt
@@ -287,6 +287,8 @@ public open class GoogleLLMClient(
         val systemMessageParts = mutableListOf<GooglePart.Text>()
         val contents = mutableListOf<GoogleContent>()
         val pendingCalls = mutableListOf<GooglePart.FunctionCall>()
+        val pendingResults = mutableListOf<GooglePart.FunctionResponse>()
+        var lastSignature: String? = null
 
         fun flushCalls() {
             if (pendingCalls.isNotEmpty()) {
@@ -295,20 +297,32 @@ public open class GoogleLLMClient(
             }
         }
 
+        fun flushResults() {
+            if (pendingResults.isNotEmpty()) {
+                contents += GoogleContent(role = "user", parts = pendingResults.toList())
+                pendingResults.clear()
+            }
+        }
+
+        fun flushAll() {
+            flushCalls()
+            flushResults()
+        }
+
         for (message in prompt.messages) {
             when (message) {
                 is Message.System -> {
                     systemMessageParts.add(GooglePart.Text(message.content))
                 }
 
                 is Message.User -> {
-                    flushCalls()
+                    flushAll()
                     // User messages become 'user' role content
                     contents.add(message.toGoogleContent(model))
                 }
 
                 is Message.Assistant -> {
-                    flushCalls()
+                    flushAll()
                     contents.add(
                         GoogleContent(
                             role = "model",
@@ -318,51 +332,64 @@ public open class GoogleLLMClient(
                 }
 
                 is Message.Reasoning -> {
-                    flushCalls()
-                    contents.add(
-                        GoogleContent(
-                            role = "model",
-                            parts = listOf(
-                                GooglePart.Text(
-                                    text = message.content,
-                                    thoughtSignature = message.encrypted,
-                                    thought = true,
+                    // Reasoning indicates a new step - flush previous step
+                    flushAll()
+
+                    if (message.content.isNotBlank()) {
+                        // If content is present, it's a "Thought Summary" -> Convert to Text part with thought=true
+                        contents.add(
+                            GoogleContent(
+                                role = "model",
+                                parts = listOf(
+                                    GooglePart.Text(
+                                        text = message.content,
+                                        thought = true,
+                                        thoughtSignature = message.encrypted
+                                    )
                                 )
                             )
                         )
-                    )
+                    } else {
+                        // If content is empty/blank, it's strictly a signature carrier for the next Tool.Call
+                        lastSignature = message.encrypted
+                    }
                 }
 
                 is Message.Tool.Result -> {
-                    flushCalls()
-                    contents.add(
-                        GoogleContent(
-                            role = "user",
-                            parts = listOf(
-                                GooglePart.FunctionResponse(
-                                    functionResponse = GoogleData.FunctionResponse(
-                                        id = message.id,
-                                        name = message.tool,
-                                        response = buildJsonObject { put("result", message.content) }
-                                    )
-                                )
+                    // Just buffer results. We only flush when we know the current tool turn is complete.
+                    pendingResults.add(
+                        GooglePart.FunctionResponse(
+                            functionResponse = GoogleData.FunctionResponse(
+                                id = message.id,
+                                name = message.tool,
+                                response = buildJsonObject { put("result", message.content) }
                             )
                         )
                     )
                 }
 
                 is Message.Tool.Call -> {
+                    // First call in step needs to flush stale results
+                    if (pendingCalls.isEmpty()) {
+                        flushResults()
+                    }
+
+                    // Use signature from preceding Reasoning message
+                    val signature = lastSignature
+                    lastSignature = null // Consume: only first call gets the signature
+
                     pendingCalls += GooglePart.FunctionCall(
                         functionCall = GoogleData.FunctionCall(
                             id = message.id,
                             name = message.tool,
                             args = json.decodeFromString(message.content)
-                        )
+                        ),
+                        thoughtSignature = signature
                     )
                 }
             }
         }
-        flushCalls()
+        flushAll()
 
         val googleTools = tools
             .map { tool ->
@@ -599,23 +626,21 @@ public open class GoogleLLMClient(
         val responses = mutableListOf<Message.Response>()
         with(responses) {
             parts.forEach { part ->
-                if (part.thoughtSignature != null && part.thought == false) {
-                    add(
-                        Message.Reasoning(
-                            encrypted = part.thoughtSignature,
-                            content = "",
-                            metaInfo = metaInfo
-                        )
-                    )
+                // Create Reasoning for any part with signature (signature carrier),
+                // unless the part itself is a thought (in which case it carries the signature)
+                val signature = part.thoughtSignature
+                val isThought = part.thought == true
+                if (signature != null && !isThought) {
+                    add(Message.Reasoning(encrypted = signature, content = "", metaInfo = metaInfo))
                 }
 
                 when (part) {
                     is GooglePart.Text -> {
-                        if (part.thought ?: false) {
+                        if (isThought) {
                             add(
                                 Message.Reasoning(
-                                    encrypted = part.thoughtSignature,
                                     content = part.text,
+                                    encrypted = signature,
                                     metaInfo = metaInfo
                                 )
                             )
@@ -630,14 +655,16 @@ public open class GoogleLLMClient(
                         }
                     }
 
-                    is GooglePart.FunctionCall -> add(
-                        Message.Tool.Call(
-                            id = Uuid.random().toString(),
-                            tool = part.functionCall.name,
-                            content = part.functionCall.args.toString(),
-                            metaInfo = metaInfo
+                    is GooglePart.FunctionCall -> {
+                        add(
+                            Message.Tool.Call(
+                                id = Uuid.random().toString(),
+                                tool = part.functionCall.name,
+                                content = part.functionCall.args.toString(),
+                                metaInfo = metaInfo
+                            )
                         )
-                    )
+                    }
 
                     is GooglePart.InlineData -> {
                         val inlineData = part.inlineData
@@ -669,8 +696,8 @@ public open class GoogleLLMClient(
         }
 
         return when {
-            // Fix the situation when the model decides to both call tools and talk
-            responses.any { it is Message.Tool.Call } -> responses.filterIsInstance<Message.Tool.Call>()
+            // When the model calls tools, keep Reasoning (for signature) and Tool.Call, filter out Assistant text
+            responses.any { it is Message.Tool.Call } -> responses.filter { it is Message.Reasoning || it is Message.Tool.Call }
             // If no messages where returned, return an empty message and check finishReason
             responses.isEmpty() -> listOf(
                 Message.Assistant(
diff --git a/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt b/prompt/prompt-executor/prompt-executor-clients/prompt-executor-google-client/src/jvmTest/kotlin/ai/koog/prompt/executor/clients/google/GoogleLLMClientTest.kt
@@ -13,14 +13,17 @@ import ai.koog.prompt.executor.clients.google.models.GoogleThinkingConfig
 import ai.koog.prompt.message.AttachmentContent
 import ai.koog.prompt.message.ContentPart
 import ai.koog.prompt.message.Message
+import ai.koog.prompt.message.RequestMetaInfo
 import ai.koog.prompt.message.ResponseMetaInfo
 import ai.koog.prompt.params.LLMParams
 import io.kotest.matchers.collections.shouldContain
 import io.kotest.matchers.collections.shouldHaveSize
 import io.kotest.matchers.shouldBe
 import io.kotest.matchers.shouldNotBe
+import io.kotest.matchers.types.shouldBeInstanceOf
 import kotlinx.serialization.json.JsonObject
 import kotlinx.serialization.json.JsonPrimitive
+import kotlinx.serialization.json.buildJsonObject
 import kotlinx.serialization.json.jsonArray
 import kotlinx.serialization.json.jsonObject
 import kotlinx.serialization.json.jsonPrimitive
@@ -384,4 +387,165 @@ class GoogleLLMClientTest {
         filePart.mimeType shouldBe "application/pdf"
         (filePart.content as AttachmentContent.Binary.Bytes).asBytes() shouldBe fileData
     }
+
+    @Test
+    fun `createGoogleRequest groups parallel Tool Results into single content`() {
+        val client = GoogleLLMClient(apiKey = "test")
+        val request = client.createGoogleRequest(
+            Prompt(
+                messages = listOf(
+                    Message.User("query", RequestMetaInfo.Empty),
+                    Message.Reasoning(encrypted = "sig", content = "", metaInfo = ResponseMetaInfo.Empty),
+                    Message.Tool.Call(id = "1", tool = "t1", content = "{}", metaInfo = ResponseMetaInfo.Empty),
+                    Message.Tool.Call(id = "2", tool = "t2", content = "{}", metaInfo = ResponseMetaInfo.Empty),
+                    Message.Tool.Result(id = "1", tool = "t1", content = "r1", metaInfo = RequestMetaInfo.Empty),
+                    Message.Tool.Result(id = "2", tool = "t2", content = "r2", metaInfo = RequestMetaInfo.Empty),
+                ),
+                id = "id"
+            ),
+            GoogleModels.Gemini3_Pro_Preview,
+            emptyList()
+        )
+
+        // Structure: User, FunctionCalls(grouped), FunctionResponses(grouped)
+        request.contents shouldHaveSize 3
+        request.contents[0].role shouldBe "user"
+        request.contents[1].role shouldBe "model"
+        request.contents[2].role shouldBe "user"
+
+        // FunctionResponses are grouped
+        val responsesParts = request.contents[2].parts!!
+        responsesParts shouldHaveSize 2
+        responsesParts.forEach { it.shouldBeInstanceOf<GooglePart.FunctionResponse>() }
+    }
+
+    @Test
+    fun `createGoogleRequest attaches signature from Reasoning to first call only`() {
+        val client = GoogleLLMClient(apiKey = "test")
+        val request = client.createGoogleRequest(
+            Prompt(
+                messages = listOf(
+                    Message.User("query", RequestMetaInfo.Empty),
+                    Message.Reasoning(encrypted = "my-sig", content = "", metaInfo = ResponseMetaInfo.Empty),
+                    Message.Tool.Call(id = "1", tool = "t1", content = "{}", metaInfo = ResponseMetaInfo.Empty),
+                    Message.Tool.Call(id = "2", tool = "t2", content = "{}", metaInfo = ResponseMetaInfo.Empty),
+                ),
+                id = "id"
+            ),
+            GoogleModels.Gemini3_Pro_Preview,
+            emptyList()
+        )
+
+        val callsParts = request.contents[1].parts!!
+        callsParts shouldHaveSize 2
+
+        val fc1 = callsParts[0] as GooglePart.FunctionCall
+        val fc2 = callsParts[1] as GooglePart.FunctionCall
+
+        fc1.thoughtSignature shouldBe "my-sig" // First gets signature
+        fc2.thoughtSignature shouldBe null // Second doesn't
+    }
+
+    @Test
+    fun `processGoogleCandidate creates Reasoning before FunctionCall with signature`() {
+        val client = GoogleLLMClient(apiKey = "test")
+        val candidate = GoogleCandidate(
+            content = GoogleContent(
+                role = "model",
+                parts = listOf(
+                    GooglePart.FunctionCall(
+                        functionCall = GoogleData.FunctionCall(name = "tool", args = buildJsonObject {}),
+                        thoughtSignature = "sig-123"
+                    )
+                )
+            ),
+            finishReason = "STOP"
+        )
+
+        val responses = client.processGoogleCandidate(candidate, ResponseMetaInfo.Empty)
+
+        responses shouldHaveSize 2
+        responses[0].shouldBeInstanceOf<Message.Reasoning>()
+        responses[1].shouldBeInstanceOf<Message.Tool.Call>()
+        (responses[0] as Message.Reasoning).encrypted shouldBe "sig-123"
+        (responses[0] as Message.Reasoning).content shouldBe ""
+    }
+
+    @Test
+    fun `processGoogleCandidate creates Reasoning from Text with thought=true`() {
+        val client = GoogleLLMClient(apiKey = "test")
+        val candidate = GoogleCandidate(
+            content = GoogleContent(
+                role = "model",
+                parts = listOf(
+                    GooglePart.Text(
+                        text = "I am thinking...",
+                        thought = true,
+                        thoughtSignature = "thought-sig"
+                    )
+                )
+            ),
+            finishReason = "STOP"
+        )
+
+        val responses = client.processGoogleCandidate(candidate, ResponseMetaInfo.Empty)
+
+        responses shouldHaveSize 1
+        responses[0].shouldBeInstanceOf<Message.Reasoning>()
+        val reasoning = responses[0] as Message.Reasoning
+        reasoning.content shouldBe "I am thinking..."
+        reasoning.encrypted shouldBe "thought-sig"
+    }
+
+    @Test
+    fun `createGoogleRequest includes Reasoning as Text part with thought=true`() {
+        val client = GoogleLLMClient(apiKey = "test")
+        val request = client.createGoogleRequest(
+            Prompt(
+                messages = listOf(
+                    Message.User("query", RequestMetaInfo.Empty),
+                    Message.Reasoning(content = "Previous thought", encrypted = "prev-sig", metaInfo = ResponseMetaInfo.Empty)
+                ),
+                id = "id"
+            ),
+            GoogleModels.Gemini3_Pro_Preview,
+            emptyList()
+        )
+
+        request.contents shouldHaveSize 2
+        val thoughtContent = request.contents[1]
+        thoughtContent.role shouldBe "model"
+        thoughtContent.parts!!.single().shouldBeInstanceOf<GooglePart.Text>()
+        val textPart = thoughtContent.parts!!.single() as GooglePart.Text
+        textPart.text shouldBe "Previous thought"
+        textPart.thought shouldBe true
+        textPart.thoughtSignature shouldBe "prev-sig"
+    }
+
+    @Test
+    fun `processGoogleCandidate creates Reasoning for InlineData with signature`() {
+        val client = GoogleLLMClient(apiKey = "test")
+        val candidate = GoogleCandidate(
+            content = GoogleContent(
+                role = "model",
+                parts = listOf(
+                    GooglePart.InlineData(
+                        inlineData = GoogleData.Blob("image/png", "png-bytes".encodeToByteArray()),
+                        thoughtSignature = "image-sig"
+                    )
+                )
+            ),
+            finishReason = "STOP"
+        )
+
+        val responses = client.processGoogleCandidate(candidate, ResponseMetaInfo.Empty)
+
+        responses shouldHaveSize 2
+        responses[0].shouldBeInstanceOf<Message.Reasoning>()
+        (responses[0] as Message.Reasoning).encrypted shouldBe "image-sig"
+
+        responses[1].shouldBeInstanceOf<Message.Assistant>()
+        val filePart = (responses[1] as Message.Assistant).parts.single() as ContentPart.Image
+        filePart.format shouldBe "png"
+    }
 }

Original file line number	Diff line number	Diff line change
`@@ -35,6 +35,7 @@ object Models {`
`35`	`35`	`@JvmStatic`
`36`	`36`	`fun googleModels(): Stream<LLModel> {`
`37`	`37`	`return Stream.of(`
	`38`	`+ GoogleModels.Gemini3_Pro_Preview,`
`38`	`39`	`GoogleModels.Gemini2_5Pro,`
`39`	`40`	`GoogleModels.Gemini2_5Flash,`
`40`	`41`	`)`