Skip to content

Commit fd0675a

Browse files
Cairn-2001Axiom
authored andcommitted
fix: prepend text message to content blocks in multimodal agent loop
When a user sends a message with image attachments via the upload API, the agent loop receives both `user_message` (text) and `user_content_blocks` (images). Previously, when content blocks were present, only the blocks were pushed to the session — the text message was silently dropped. The LLM received the images but not the user's question or context. This fix prepends the text message as a ContentBlock::Text into the blocks vector before pushing to the session, so the LLM sees both the user's text AND any attached images in a single turn. Both the non-streaming and streaming agent loop paths are fixed. Before: User: "What color is this?" + [image of blue square] LLM receives: [image only, no text] Response: "I can't see the image directly" After: User: "What color is this?" + [image of blue square] LLM receives: [text: "What color is this?", image: blue square] Response: "Blue" Tested with Qwen 3.5 Plus and Gemini 2.5 Flash via OpenRouter. Images up to 1.3MB confirmed working through the full pipeline. Signed-off-by: Cairn-2001 <Cairn-2001@smoothcurves.nexus>
1 parent a26f762 commit fd0675a

File tree

1 file changed

+24
-2
lines changed

1 file changed

+24
-2
lines changed

crates/openfang-runtime/src/agent_loop.rs

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,18 @@ pub async fn run_agent_loop(
279279
// Add the user message to session history.
280280
// When content blocks are provided (e.g. text + image from a channel),
281281
// use multimodal message format so the LLM receives the image for vision.
282-
if let Some(blocks) = user_content_blocks {
282+
// The text message is prepended to the blocks so the LLM sees both the
283+
// user's question AND any attached images in a single turn.
284+
if let Some(mut blocks) = user_content_blocks {
285+
if !user_message.is_empty() {
286+
blocks.insert(
287+
0,
288+
ContentBlock::Text {
289+
text: user_message.to_string(),
290+
provider_metadata: None,
291+
},
292+
);
293+
}
283294
session.messages.push(Message::user_with_blocks(blocks));
284295
} else {
285296
session.messages.push(Message::user(user_message));
@@ -1448,7 +1459,18 @@ pub async fn run_agent_loop_streaming(
14481459
// Add the user message to session history.
14491460
// When content blocks are provided (e.g. text + image from a channel),
14501461
// use multimodal message format so the LLM receives the image for vision.
1451-
if let Some(blocks) = user_content_blocks {
1462+
// The text message is prepended to the blocks so the LLM sees both the
1463+
// user's question AND any attached images in a single turn.
1464+
if let Some(mut blocks) = user_content_blocks {
1465+
if !user_message.is_empty() {
1466+
blocks.insert(
1467+
0,
1468+
ContentBlock::Text {
1469+
text: user_message.to_string(),
1470+
provider_metadata: None,
1471+
},
1472+
);
1473+
}
14521474
session.messages.push(Message::user_with_blocks(blocks));
14531475
} else {
14541476
session.messages.push(Message::user(user_message));

0 commit comments

Comments
 (0)