Skip to content

Commit c677a91

Browse files
authored
feat(cpp): VLM image support in C++ SDK (#858)
Closes #785 ## Changes - Adds `gaia::Image` with `fromBytes` / `fromFile` factories, RFC 4648 base64 encoding, magic-byte MIME detection (PNG/JPEG/GIF/WebP/BMP), 20 MiB size cap, `O_NOFOLLOW` + post-open `fstat` TOCTOU guard on POSIX, and a whitelist enforcing only the five supported MIME types - Adds `gaia::ContentPart` (text / image_url parts with `toJson()` producing the OpenAI vision wire format) - Extends `gaia::Message` with an additive `std::optional<std::vector<ContentPart>> parts` field; `toJson()` dispatches to array or string accordingly — fully backward-compatible with existing aggregate-init sites - Adds two new `processQuery` overloads (`string + vector<Image>` ergonomic; `vector<Message>` caller-composed) unified through a private `processQueryInternal` that is the sole writer of `conversationHistory_` - Image parts are stripped from history at end-of-turn (base64 never retained across calls) - RAII `InFlightGuard` via `std::atomic<bool> inFlight_` and `compare_exchange_strong` — concurrent `processQuery` calls on the same Agent throw `std::runtime_error` - Empty-input validation fires **before** `ensureModelLoaded` so no `/load` fires on invalid input - Lifts `cpp/benchmarks/mock_llm_server.h` to `cpp/tests/support/mock_llm_server.h` and extends it with `receivedBodies()`, `loadRequestCount()`, `holdNextResponse()`, and a `reportModelLoaded` constructor flag; benchmark header is now a thin shim - Adds `cpp/examples/vlm_agent.cpp` end-to-end demo - Adds `cpp/tests/integration/test_integration_vlm.cpp` (3 tests: live Lemonade VLM smoke, messages-list overload, ctx-overflow error surface) with Lemonade version-pin probe via `GAIA_PINNED_LEMONADE_VERSION` - Updates `docs/cpp/api-reference.mdx`, `docs/cpp/overview.mdx`, `docs/cpp/quickstart.mdx` with VLM section, new overloads, thread-safety update, and example invocation ## Test coverage | Layer | Tests | Status | |-------|-------|--------| | Unit — MIME / base64 / Image | 20 tests in `test_image.cpp` | ✅ 331/331 pass | | Unit — ContentPart / Message | 15 new tests in `test_types.cpp` | ✅ | | Agent-level (mock HTTP) | 13 tests in `test_agent_vlm.cpp` | ✅ | | Integration (live Lemonade) | 3 tests in `test_integration_vlm.cpp` | gated, opt-in | ## Reviewer notes - The `Message::parts` field is **additive** — all existing code compiles unchanged; consumers linked against a prebuilt `gaia_core` must rebuild (noted in docs) - `detectImageMimeType` returns `""` (empty string) for ≥ 12-byte buffers with unrecognized magic; returns `"image/png"` only for null/short buffers (AC-15e safe-fallback contract) - Integration tests require `-DGAIA_BUILD_INTEGRATION_TESTS=ON` and a live Lemonade server with `Qwen3-VL-4B-Instruct-GGUF`; they are non-blocking in CI - The pre-existing `errorCount` unused-variable warning in `agent.cpp:692` is not introduced by this PR
1 parent def8adb commit c677a91

20 files changed

Lines changed: 1898 additions & 164 deletions

.gitattributes

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Binary test fixtures — prevent LF/CRLF mangling
2+
cpp/tests/fixtures/*.png binary
3+
cpp/tests/fixtures/*.jpg binary

cpp/CMakeLists.txt

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ endif()
9393
# ---------------------------------------------------------------------------
9494
add_library(gaia_core
9595
src/types.cpp
96+
src/image.cpp
9697
src/tool_registry.cpp
9798
src/console.cpp
9899
src/clean_console.cpp
@@ -191,6 +192,9 @@ endif()
191192
if(GAIA_BUILD_EXAMPLES)
192193
add_executable(security_demo examples/security_demo.cpp)
193194
target_link_libraries(security_demo PRIVATE gaia::gaia_core)
195+
196+
add_executable(vlm_agent examples/vlm_agent.cpp)
197+
target_link_libraries(vlm_agent PRIVATE gaia::gaia_core)
194198
endif()
195199

196200
# ---------------------------------------------------------------------------
@@ -201,9 +205,11 @@ if(GAIA_BUILD_TESTS)
201205

202206
add_executable(tests_mock
203207
tests/test_types.cpp
208+
tests/test_image.cpp
204209
tests/test_tool_registry.cpp
205210
tests/test_json_utils.cpp
206211
tests/test_agent.cpp
212+
tests/test_agent_vlm.cpp
207213
tests/test_mcp_client.cpp
208214
tests/test_console.cpp
209215
tests/test_lemonade_client.cpp
@@ -219,6 +225,23 @@ if(GAIA_BUILD_TESTS)
219225
GTest::gtest_main
220226
)
221227

228+
# VLM tests need httplib (mock LLM server) and the fixtures directory.
229+
if(httplib_FOUND)
230+
target_link_libraries(tests_mock PRIVATE httplib::httplib)
231+
else()
232+
target_include_directories(tests_mock SYSTEM PRIVATE
233+
$<TARGET_PROPERTY:httplib::httplib,INTERFACE_INCLUDE_DIRECTORIES>)
234+
endif()
235+
if(OpenSSL_FOUND)
236+
target_compile_definitions(tests_mock PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
237+
target_link_libraries(tests_mock PRIVATE OpenSSL::SSL OpenSSL::Crypto)
238+
endif()
239+
240+
target_compile_definitions(tests_mock PRIVATE
241+
GAIA_TEST_FIXTURES_DIR="${CMAKE_CURRENT_SOURCE_DIR}/tests/fixtures"
242+
)
243+
target_include_directories(tests_mock PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/tests)
244+
222245
include(GoogleTest)
223246
gtest_discover_tests(tests_mock)
224247
endif()
@@ -236,13 +259,18 @@ if(GAIA_BUILD_INTEGRATION_TESTS)
236259
tests/integration/test_integration_mcp.cpp
237260
tests/integration/test_integration_wifi.cpp
238261
tests/integration/test_integration_health.cpp
262+
tests/integration/test_integration_vlm.cpp
239263
)
240264

241265
target_link_libraries(tests_integration PRIVATE
242266
gaia::gaia_core
243267
GTest::gtest
244268
)
245269

270+
target_compile_definitions(tests_integration PRIVATE
271+
GAIA_TEST_FIXTURES_DIR="${CMAKE_CURRENT_SOURCE_DIR}/tests/fixtures"
272+
)
273+
246274
include(GoogleTest)
247275
gtest_discover_tests(tests_integration
248276
PROPERTIES TIMEOUT 300

cpp/benchmarks/mock_llm_server.h

Lines changed: 4 additions & 148 deletions
Original file line numberDiff line numberDiff line change
@@ -1,154 +1,10 @@
11
// Copyright(C) 2025-2026 Advanced Micro Devices, Inc. All rights reserved.
22
// SPDX-License-Identifier: MIT
33
//
4-
// In-process mock HTTP server mimicking the Lemonade Server API.
5-
// Used by benchmarks to avoid requiring a real LLM backend.
4+
// The canonical mock server now lives under cpp/tests/support/. This header
5+
// remains as a thin re-include so existing benchmark sources compile
6+
// unchanged. Do not add new contents here.
67

78
#pragma once
89

9-
#include <atomic>
10-
#include <chrono>
11-
#include <deque>
12-
#include <mutex>
13-
#include <stdexcept>
14-
#include <string>
15-
#include <thread>
16-
17-
#include <httplib.h>
18-
19-
namespace bench {
20-
21-
// Default chat completion response — agent returns a final answer immediately.
22-
static const std::string kDefaultAnswer = R"({"choices":[{"message":{"content":"{\"thought\":\"done\",\"goal\":\"complete\",\"answer\":\"benchmark result\"}"}}]})";
23-
24-
// Tool-call response — agent calls the echo tool first.
25-
static const std::string kToolCall = R"({"choices":[{"message":{"content":"{\"thought\":\"calling tool\",\"goal\":\"test\",\"tool\":\"echo\",\"tool_args\":{\"message\":\"bench\"}}"}}]})";
26-
27-
// Health response — reports mock-model as already loaded so ensureModelLoaded() skips /load.
28-
static const std::string kHealthOk = R"({"status":"ok","all_models_loaded":[{"model_name":"mock-model","recipe_options":{"ctx_size":16384}}]})";
29-
30-
// Models list response
31-
static const std::string kModelsList = R"({"data":[{"id":"mock-model"}]})";
32-
33-
// Load response
34-
static const std::string kLoadOk = R"({"status":"ok"})";
35-
36-
class MockLlmServer {
37-
public:
38-
/// Start server on an OS-assigned port.
39-
/// Constructor blocks until the server is accepting connections.
40-
MockLlmServer() : server_(std::make_unique<httplib::Server>()) {
41-
registerHandlers();
42-
43-
// bind_to_any_port returns the OS-assigned port (avoids CI port conflicts)
44-
port_ = server_->bind_to_any_port("127.0.0.1");
45-
if (port_ <= 0) {
46-
throw std::runtime_error("MockLlmServer: failed to bind to any port");
47-
}
48-
49-
thread_ = std::thread([this]() { server_->listen_after_bind(); });
50-
51-
waitUntilReady();
52-
}
53-
54-
~MockLlmServer() {
55-
server_->stop();
56-
if (thread_.joinable()) {
57-
thread_.join();
58-
}
59-
}
60-
61-
// Non-copyable, non-movable
62-
MockLlmServer(const MockLlmServer&) = delete;
63-
MockLlmServer& operator=(const MockLlmServer&) = delete;
64-
65-
/// The port the server is listening on.
66-
int port() const { return port_; }
67-
68-
/// Base URL suitable for AgentConfig::baseUrl (without /api/v1 — LemonadeClient adds it).
69-
std::string baseUrl() const { return "http://127.0.0.1:" + std::to_string(port_); }
70-
71-
/// Push a response to return for the next POST /chat/completions call.
72-
/// When the queue is empty the default answer response is returned.
73-
void pushResponse(const std::string& body) {
74-
std::lock_guard<std::mutex> lk(mu_);
75-
responseQueue_.push_back(body);
76-
}
77-
78-
/// Push N copies of a response.
79-
void pushResponses(const std::string& body, int n) {
80-
std::lock_guard<std::mutex> lk(mu_);
81-
for (int i = 0; i < n; ++i) {
82-
responseQueue_.push_back(body);
83-
}
84-
}
85-
86-
/// Clear pending queued responses.
87-
void clearQueue() {
88-
std::lock_guard<std::mutex> lk(mu_);
89-
responseQueue_.clear();
90-
}
91-
92-
/// Number of chat completion requests handled so far.
93-
int requestCount() const { return requestCount_.load(); }
94-
95-
private:
96-
void registerHandlers() {
97-
// Health check — always reports mock-model loaded
98-
server_->Get("/api/v1/health", [](const httplib::Request&, httplib::Response& res) {
99-
res.set_content(kHealthOk, "application/json");
100-
});
101-
102-
// Load model — no-op safety fallback
103-
server_->Post("/api/v1/load", [](const httplib::Request&, httplib::Response& res) {
104-
res.set_content(kLoadOk, "application/json");
105-
});
106-
107-
// Models list
108-
server_->Get("/api/v1/models", [](const httplib::Request&, httplib::Response& res) {
109-
res.set_content(kModelsList, "application/json");
110-
});
111-
112-
// Chat completions — dequeue a pre-loaded response or return default answer
113-
server_->Post("/api/v1/chat/completions",
114-
[this](const httplib::Request&, httplib::Response& res) {
115-
++requestCount_;
116-
std::string body;
117-
{
118-
std::lock_guard<std::mutex> lk(mu_);
119-
if (!responseQueue_.empty()) {
120-
body = responseQueue_.front();
121-
responseQueue_.pop_front();
122-
} else {
123-
body = kDefaultAnswer;
124-
}
125-
}
126-
res.set_content(body, "application/json");
127-
});
128-
}
129-
130-
void waitUntilReady() {
131-
// Poll health endpoint until the server responds
132-
httplib::Client cli("127.0.0.1", port_);
133-
cli.set_connection_timeout(1);
134-
cli.set_read_timeout(1);
135-
136-
for (int attempt = 0; attempt < 50; ++attempt) {
137-
auto res = cli.Get("/api/v1/health");
138-
if (res && res->status == 200) {
139-
return;
140-
}
141-
std::this_thread::sleep_for(std::chrono::milliseconds(20));
142-
}
143-
throw std::runtime_error("MockLlmServer: server did not become ready");
144-
}
145-
146-
std::unique_ptr<httplib::Server> server_;
147-
std::thread thread_;
148-
int port_ = 0;
149-
std::mutex mu_;
150-
std::deque<std::string> responseQueue_;
151-
std::atomic<int> requestCount_{0};
152-
};
153-
154-
} // namespace bench
10+
#include "../tests/support/mock_llm_server.h"

cpp/examples/vlm_agent.cpp

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
// Copyright(C) 2025-2026 Advanced Micro Devices, Inc. All rights reserved.
2+
// SPDX-License-Identifier: MIT
3+
//
4+
// Minimal VLM example: loads an image from disk and asks a vision model
5+
// about it via the OpenAI-compatible /chat/completions endpoint.
6+
//
7+
// Usage: vlm_agent <image_path> [prompt]
8+
//
9+
// Requires a Lemonade server running with a VLM model loaded.
10+
// Environment:
11+
// LEMONADE_BASE_URL (default: http://localhost:8000/api/v1)
12+
// GAIA_MODEL_ID (default: Qwen3-VL-4B-Instruct-GGUF)
13+
14+
#include <cstdlib>
15+
#include <iostream>
16+
#include <string>
17+
#include <vector>
18+
19+
#include <gaia/agent.h>
20+
#include <gaia/types.h>
21+
22+
int main(int argc, char** argv) {
23+
if (argc < 2) {
24+
std::cerr << "Usage: " << (argc > 0 ? argv[0] : "vlm_agent")
25+
<< " <image_path> [prompt]\n";
26+
return 2;
27+
}
28+
std::string imagePath = argv[1];
29+
std::string prompt = (argc >= 3) ? argv[2] : "Describe this image.";
30+
31+
try {
32+
gaia::Image img = gaia::Image::fromFile(imagePath);
33+
std::cout << "Loaded " << img.size() << " bytes, MIME: "
34+
<< img.mimeType() << "\n";
35+
36+
gaia::AgentConfig cfg;
37+
cfg.modelId = gaia::getEnvVar("GAIA_MODEL_ID", "Qwen3-VL-4B-Instruct-GGUF");
38+
cfg.contextSize = 32768; // VLM-recommended
39+
cfg.maxSteps = 3;
40+
cfg.silentMode = false;
41+
42+
gaia::Agent agent(cfg);
43+
gaia::json result = agent.processQuery(prompt, {img});
44+
45+
std::cout << "\n== Answer ==\n"
46+
<< result.value("result", "<no result>") << "\n";
47+
return 0;
48+
} catch (const std::exception& e) {
49+
std::cerr << "Error: " << e.what() << "\n";
50+
return 1;
51+
}
52+
}

cpp/include/gaia/agent.h

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,20 @@ class GAIA_API Agent {
5757
/// @return JSON result with "result" key containing the final answer
5858
json processQuery(const std::string& userInput, int maxSteps = 0);
5959

60+
/// VLM convenience overload: text + images in a single user turn.
61+
/// Images are sent as base64 data-URIs inside an OpenAI-compatible
62+
/// image_url content part. Stateful and symmetric with the string
63+
/// overload: history is appended with text-only stripped messages.
64+
json processQuery(const std::string& userInput,
65+
const std::vector<Image>& images,
66+
int maxSteps = 0);
67+
68+
/// Low-level overload: caller composes the turn as a vector of
69+
/// Messages (which may include pre-set `parts` for mixed content).
70+
/// The messages are appended to conversationHistory_ (stripped of
71+
/// image parts on store). Throws std::invalid_argument on empty input.
72+
json processQuery(const std::vector<Message>& messages, int maxSteps = 0);
73+
6074
/// Connect to an MCP server and register its tools.
6175
/// Mirrors Python MCPClientMixin.connect_mcp_server().
6276
///
@@ -140,6 +154,12 @@ class GAIA_API Agent {
140154
virtual std::string getSystemPrompt() const { return ""; }
141155

142156
private:
157+
/// Unified entry point for all processQuery overloads. Owns the full
158+
/// conversation turn: concurrency guard, empty-input validation,
159+
/// ensureModelLoaded, history prepend, LLM loop, and end-of-turn
160+
/// history write (text-only; image parts stripped).
161+
json processQueryInternal(const std::vector<Message>& userMessages, int maxSteps);
162+
143163
// ---- LLM Communication ----
144164

145165
/// Send messages to the LLM and get a response.
@@ -172,6 +192,10 @@ class GAIA_API Agent {
172192
LemonadeClient lemonade_;
173193
std::atomic<bool> modelEnsured_{false};
174194

195+
// Concurrency guard — Agent is NOT re-entrant. A second processQuery
196+
// call on the same Agent (from any thread) throws std::runtime_error.
197+
std::atomic<bool> inFlight_{false};
198+
175199
AgentState executionState_ = AgentState::PLANNING;
176200
json currentPlan_;
177201
int currentStep_ = 0;

0 commit comments

Comments
 (0)