Commit fd0675a
fix: prepend text message to content blocks in multimodal agent loop
When a user sends a message with image attachments via the upload API,
the agent loop receives both `user_message` (text) and
`user_content_blocks` (images). Previously, when content blocks were
present, only the blocks were pushed to the session — the text message
was silently dropped. The LLM received the images but not the user's
question or context.
This fix prepends the text message as a ContentBlock::Text into the
blocks vector before pushing to the session, so the LLM sees both
the user's text AND any attached images in a single turn.
Both the non-streaming and streaming agent loop paths are fixed.
Before:
User: "What color is this?" + [image of blue square]
LLM receives: [image only, no text]
Response: "I can't see the image directly"
After:
User: "What color is this?" + [image of blue square]
LLM receives: [text: "What color is this?", image: blue square]
Response: "Blue"
Tested with Qwen 3.5 Plus and Gemini 2.5 Flash via OpenRouter.
Images up to 1.3MB confirmed working through the full pipeline.
Signed-off-by: Cairn-2001 <Cairn-2001@smoothcurves.nexus>1 parent a26f762 commit fd0675a
1 file changed
+24
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | | - | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
283 | 294 | | |
284 | 295 | | |
285 | 296 | | |
| |||
1448 | 1459 | | |
1449 | 1460 | | |
1450 | 1461 | | |
1451 | | - | |
| 1462 | + | |
| 1463 | + | |
| 1464 | + | |
| 1465 | + | |
| 1466 | + | |
| 1467 | + | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
1452 | 1474 | | |
1453 | 1475 | | |
1454 | 1476 | | |
| |||
0 commit comments