feat(framework:cc): Update C++ SDK for Flower 1.27.0 inflatable objects protocol#6913
feat(framework:cc): Update C++ SDK for Flower 1.27.0 inflatable objects protocol#6913uncleDecart wants to merge 1 commit intoflwrlabs:mainfrom
Conversation
…ts protocol The C++ SDK was stuck on the pre-1.13 protocol and could not communicate with modern Flower SuperLinks. This updates it to support the 1.27.0 "inflatable objects" protocol where message payloads are stored in an ObjectStore and transferred via PullObject/PushObject RPCs. Key changes: - Regenerate proto stubs from current .proto definitions (adds message, recorddict, heartbeat, run, fab protos; removes stale recordset/task) - Implement inflate/deflate in serde.cc for bottom-up object reconstruction and top-down serialization with SHA-256 object IDs - Rewrite communicator.cc to use PullMessages/PushMessages/PullObject/ PushObject/ConfirmMessageReceived Fleet API RPCs - Rewrite grpc_rere.cc with ECDSA node authentication and new RPC stubs - Update typing.h with Message/RecordDict/Array types matching Python SDK - Add Docker quickstart (Dockerfile, docker-compose.yml, pyproject.toml) for easy testing with SuperLink + 2 C++ clients + Python ServerApp - Update README with Docker instructions
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ccb6830a4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| confirm_req.set_run_id(msg.metadata.run_id); | ||
| confirm_req.set_message_object_id(msg.metadata.message_id); | ||
| communicator->send_confirm_message_received(confirm_req, &confirm_resp); |
There was a problem hiding this comment.
Confirm receipt only after all message objects are available
This acknowledges the message even when object pulls were incomplete (object_found/object_available can be false earlier in receive), but ConfirmMessageReceived deletes the message object server-side. In the common race where the message arrives before all preregistered objects are uploaded, the client can confirm too early, lose the message permanently, and proceed with missing/empty content instead of retrying pulls.
Useful? React with 👍 / 👎.
| auto q3 = arr_parsed.body.find('"', q2 + 1); | ||
| auto q4 = arr_parsed.body.find('"', q3 + 1); | ||
| if (q3 != std::string::npos && q4 != std::string::npos) | ||
| stype = arr_parsed.body.substr(q3 + 1, q4 - q3 - 1); |
There was a problem hiding this comment.
Parse the
stype value instead of the next JSON key
The quote-walking logic extracts the substring between q3 and q4, which corresponds to the next key name (for example arraychunk_ids) rather than the stype value. As a result, inflated arrays carry a corrupted stype, and downstream conversion to legacy Parameters propagates an incorrect tensor_type, which can break clients that select deserialization based on that field.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR updates the Flower C++ SDK and quickstart example to interoperate with modern Flower SuperLinks (Flower 1.27.0), including the “inflatable objects” protocol and the Fleet API RPC set.
Changes:
- Regenerates/updates C++ protobuf and gRPC stubs for newer protocol definitions (message/recorddict/heartbeat/run/fab, updated node/transport).
- Reworks the C++ client runtime to use Fleet API node lifecycle, message pull/push, object pull/push, and message receipt confirmation.
- Updates the C++ quickstart example with Docker Compose-based end-to-end setup and a minimal Python app scaffold.
Reviewed changes
Copilot reviewed 40 out of 50 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| framework/cc/flwr/src/start.cc | Client startup flow updated (register/activate/heartbeat thread/message loop/cleanup). |
| framework/cc/flwr/src/message_handler.cc | Switch from TaskIns/TaskRes handling to Message/RecordDict-based handling. |
| framework/cc/flwr/src/grpc_rere.cc | Implements Fleet RPCs and node auth metadata signing. |
| framework/cc/flwr/src/communicator.cc | Implements PullMessages/PushMessages and inflatable object pull/push/inflate/deflate flow. |
| framework/cc/flwr/include/typing.h | Updates SDK typing to RecordDict/Message/Metadata and new record value variants. |
| framework/cc/flwr/include/start.h | Removes legacy transport gRPC include and aligns includes with new flow. |
| framework/cc/flwr/include/serde.h | Adds RecordDict/Message serde APIs + inflatable object helpers. |
| framework/cc/flwr/include/message_handler.h | Updates handler API to handle_message(Message). |
| framework/cc/flwr/include/grpc_rere.h | Updates communicator interface to Fleet RPCs + node auth support. |
| framework/cc/flwr/include/flwr/proto/transport.pb.h | Regenerated transport proto (adds Scalar uint64 oneof arm). |
| framework/cc/flwr/include/flwr/proto/transport.pb.cc | Regenerated transport proto implementation for Scalar changes. |
| framework/cc/flwr/include/flwr/proto/run.grpc.pb.h | Adds run gRPC stub header (regenerated). |
| framework/cc/flwr/include/flwr/proto/run.grpc.pb.cc | Adds run gRPC stub implementation (regenerated). |
| framework/cc/flwr/include/flwr/proto/recorddict.grpc.pb.h | Adds recorddict gRPC stub header (regenerated). |
| framework/cc/flwr/include/flwr/proto/recorddict.grpc.pb.cc | Adds recorddict gRPC stub implementation (regenerated). |
| framework/cc/flwr/include/flwr/proto/node.pb.h | Regenerated node proto (uint64 node_id, adds NodeInfo). |
| framework/cc/flwr/include/flwr/proto/node.pb.cc | Regenerated node proto implementation (adds NodeInfo). |
| framework/cc/flwr/include/flwr/proto/message.grpc.pb.h | Adds message gRPC stub header (regenerated). |
| framework/cc/flwr/include/flwr/proto/message.grpc.pb.cc | Adds message gRPC stub implementation (regenerated). |
| framework/cc/flwr/include/flwr/proto/heartbeat.pb.h | Adds heartbeat proto header (regenerated). |
| framework/cc/flwr/include/flwr/proto/heartbeat.pb.cc | Adds heartbeat proto implementation (regenerated). |
| framework/cc/flwr/include/flwr/proto/heartbeat.grpc.pb.h | Adds heartbeat gRPC stub header (regenerated). |
| framework/cc/flwr/include/flwr/proto/heartbeat.grpc.pb.cc | Fixes heartbeat gRPC stub source/header includes (regenerated). |
| framework/cc/flwr/include/flwr/proto/fab.pb.cc | Adds fab proto implementation (regenerated). |
| framework/cc/flwr/include/flwr/proto/fab.grpc.pb.h | Adds fab gRPC stub header (regenerated). |
| framework/cc/flwr/include/flwr/proto/fab.grpc.pb.cc | Adds fab gRPC stub implementation (regenerated). |
| framework/cc/flwr/include/communicator.h | Updates Communicator interface to Fleet API and Message-based exchange. |
| framework/cc/flwr/CMakeLists.txt | Adds new protos to generation and links ssl/crypto for node auth. |
| examples/quickstart-cpp/src/main.cc | Updates CLI help text and binary name. |
| examples/quickstart-cpp/README.md | Updates instructions (Docker Compose + local run) and binary naming. |
| examples/quickstart-cpp/pyproject.toml | Adds Python project metadata and flwr run app configuration. |
| examples/quickstart-cpp/Dockerfile | Adds Docker image to build C++ client + Python deps for quickstart. |
| examples/quickstart-cpp/docker-compose.yml | Adds Compose setup for SuperLink + two C++ clients + runner. |
| examples/quickstart-cpp/CMakeLists.txt | Updates build to link ssl/crypto and renames executable. |
| examples/quickstart-cpp/client.py | Adds placeholder Python ClientApp to satisfy app config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| import flwr as fl | ||
|
|
||
| app = fl.client.ClientApp(client_fn=lambda context: fl.client.NumPyClient().to_client()) |
There was a problem hiding this comment.
NumPyClient() is an abstract base class and cannot be instantiated; this placeholder ClientApp will crash when imported/executed. Replace this with a minimal concrete Client implementation (or a ClientApp that raises a clear error if started) so flwr run doesn’t fail when it tries to launch/validate the client component.
| app = fl.client.ClientApp(client_fn=lambda context: fl.client.NumPyClient().to_client()) | |
| def client_fn(context) -> fl.client.Client: | |
| """Placeholder client factory for the C++ quickstart. | |
| The actual clients for this example are implemented in C++ and connect to | |
| the server via gRPC. This Python client placeholder exists only to satisfy | |
| the Flower app configuration requirements and must not be started. | |
| """ | |
| raise RuntimeError( | |
| "This Python ClientApp is a placeholder for the C++ quickstart. " | |
| "Use the C++ SuperNode clients instead of starting this client." | |
| ) | |
| app = fl.client.ClientApp(client_fn=client_fn) |
| // Confirm message received | ||
| flwr::proto::ConfirmMessageReceivedRequest confirm_req; | ||
| flwr::proto::ConfirmMessageReceivedResponse confirm_resp; | ||
| auto *confirm_node = new flwr::proto::Node(); | ||
| confirm_node->set_node_id(node_id); |
There was a problem hiding this comment.
Message receipt is confirmed unconditionally after attempting to pull/inflate objects. Since ConfirmMessageReceived deletes the message object on the server, confirming when objects are missing/unavailable (or inflation failed) can permanently drop the message and prevent retries. Only confirm after all required objects are successfully retrieved and the message inflated successfully; otherwise return nullopt/retry without confirming.
| if (communicator->send_pull_object(pull_req, &pull_resp)) { | ||
| std::cerr << "[DEBUG recv] obj " << obj_id.substr(0,16) | ||
| << " found=" << pull_resp.object_found() | ||
| << " avail=" << pull_resp.object_available() | ||
| << " size=" << pull_resp.object_content().size() << std::endl; |
There was a problem hiding this comment.
When PullObjectResponse returns object_found=false or object_available=false, the code just skips adding the object and continues. This can lead to partial objects maps and subsequent inflation failures. Consider treating missing/unavailable objects as a hard failure for this message (don’t confirm receipt, and retry pulling later) instead of proceeding with an incomplete object set.
| if (msg_type == "reconnect") { | ||
| keep_going = false; | ||
| } else if (msg_type == "get_parameters") { | ||
| result_content = _get_parameters(client); |
There was a problem hiding this comment.
Reconnect handling is incomplete: on msg_type == "reconnect" the code immediately sets keep_going=false without parsing the reconnect instructions from message.content (e.g., config.seconds) and without producing the expected DisconnectRes/ACK reply content. This will prevent server-directed reconnect/backoff from working correctly.
| } else if (msg_type == "train") { | ||
| if (message.content) { | ||
| result_content = _fit(client, *message.content); | ||
| } | ||
| } else if (msg_type == "evaluate") { |
There was a problem hiding this comment.
For train/evaluate, missing message.content is silently ignored and results in sending an effectively empty reply. This can produce malformed responses that look successful but omit required fields. Prefer treating missing content as an error (throw or set reply.error) so invalid replies aren’t sent.
| BIGNUM *priv_bn = BN_bin2bn(key_bytes, sizeof(key_bytes), nullptr); | ||
| EC_KEY_set_private_key(ec_key, priv_bn); | ||
|
|
||
| // Derive public key from private key scalar. | ||
| EC_POINT *pub_pt = EC_POINT_new(group); |
There was a problem hiding this comment.
The EC private key is set from raw bytes via BN_bin2bn/EC_KEY_set_private_key without reducing modulo the curve order or checking for zero/out-of-range values. This can generate invalid keys and cause intermittent authentication/signature failures. Prefer EVP/EC keygen APIs (or reduce into [1, order-1] and validate return codes).
| FILE *f = fopen("/dev/urandom", "rb"); | ||
| if (f) { | ||
| fread(key_bytes, 1, sizeof(key_bytes), f); | ||
| fclose(f); | ||
| } |
There was a problem hiding this comment.
This code reads entropy from /dev/urandom and includes <unistd.h>, which won’t compile/work on Windows. If cross-platform support is desired, consider using portable entropy sources (e.g., std::random_device) or adding platform-specific alternatives behind #ifdefs.
| EVP_MD_CTX *md = EVP_MD_CTX_new(); | ||
| EVP_DigestSignInit(md, nullptr, EVP_sha256(), nullptr, pkey_); | ||
| EVP_DigestSignUpdate(md, timestamp.data(), timestamp.size()); | ||
| size_t sig_len = 0; | ||
| EVP_DigestSignFinal(md, nullptr, &sig_len); |
There was a problem hiding this comment.
Crypto API return values are not checked (e.g., EVP_MD_CTX_new, EVP_DigestSignInit/Update/Final). If any call fails, this can lead to null dereferences or invalid signatures being sent. Please check return codes and fail the RPC deterministically with a clear error when signing fails.
|
|
||
| target_include_directories(flwr PUBLIC | ||
| $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include> | ||
| $<BUILD_INTERFACE:${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include> |
There was a problem hiding this comment.
grpc_SOURCE_DIR is likely not defined by FetchContent_MakeAvailable(gRPC) (FetchContent sets gRPC_SOURCE_DIR/gRPC_BINARY_DIR using the declared name). As a result, this include path may be empty/incorrect and <openssl/...> headers from gRPC’s BoringSSL may not be found. Consider switching to ${gRPC_SOURCE_DIR} (or, better, rely on the ssl/crypto targets’ exported include directories).
| $<BUILD_INTERFACE:${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include> | |
| $<BUILD_INTERFACE:${gRPC_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include> |
|
|
||
| target_include_directories(flwr PUBLIC | ||
| ${FLWR_INCLUDE_DIR} | ||
| ${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include |
There was a problem hiding this comment.
grpc_SOURCE_DIR is likely not defined by FetchContent_MakeAvailable(gRPC) (FetchContent typically sets gRPC_SOURCE_DIR). If this variable is empty, the added BoringSSL include path won’t take effect and <openssl/...> includes may fail. Consider using ${gRPC_SOURCE_DIR} or inheriting include dirs from the linked ssl/crypto targets.
| ${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include | |
| ${gRPC_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include |
Description
The C++ SDK was stuck on the pre-1.13 protocol and could not communicate with modern Flower SuperLinks. This updates it to support the 1.27.0 "inflatable objects" protocol where message payloads are stored in an ObjectStore and transferred via PullObject/PushObject RPCs.
Key changes:
Running example seems to work,
cd examples/quickstart-cpp docker compose up --buildNotes
Related issues/PRs
Checklist
#contributions)