Skip to content

feat(framework:cc): Update C++ SDK for Flower 1.27.0 inflatable objects protocol#6913

Open
uncleDecart wants to merge 1 commit intoflwrlabs:mainfrom
uncleDecart:cpp-update
Open

feat(framework:cc): Update C++ SDK for Flower 1.27.0 inflatable objects protocol#6913
uncleDecart wants to merge 1 commit intoflwrlabs:mainfrom
uncleDecart:cpp-update

Conversation

@uncleDecart
Copy link
Copy Markdown

Description

The C++ SDK was stuck on the pre-1.13 protocol and could not communicate with modern Flower SuperLinks. This updates it to support the 1.27.0 "inflatable objects" protocol where message payloads are stored in an ObjectStore and transferred via PullObject/PushObject RPCs.

Key changes:

  • Regenerate proto stubs from current .proto definitions (adds message, recorddict, heartbeat, run, fab protos; removes stale recordset/task)
  • Implement inflate/deflate in serde.cc for bottom-up object reconstruction and top-down serialization with SHA-256 object IDs
  • Rewrite communicator.cc to use PullMessages/PushMessages/PullObject/ PushObject/ConfirmMessageReceived Fleet API RPCs
  • Rewrite grpc_rere.cc with ECDSA node authentication and new RPC stubs
  • Update typing.h with Message/RecordDict/Array types matching Python SDK
  • Add Docker quickstart (Dockerfile, docker-compose.yml, pyproject.toml) for easy testing with SuperLink + 2 C++ clients + Python ServerApp
  • Update README with Docker instructions

Running example seems to work,

cd examples/quickstart-cpp
docker compose up --build

Notes

  1. I'm yet to try more than a toy example (ggml or something from MLIR), I did the changes to my best understanding of the framework.
  2. I know that there's a migration to hub/apps (refactor(examples): Migrate examples to hub/apps & update links #6866) so not sure if I need to create app on flower hub, changes to C++ libraries are still useful in framework.

Related issues/PRs

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Address LLM-reviewer comments, if applicable (e.g., GitHub Copilot)
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

…ts protocol

The C++ SDK was stuck on the pre-1.13 protocol and could not communicate
with modern Flower SuperLinks. This updates it to support the 1.27.0
"inflatable objects" protocol where message payloads are stored in an
ObjectStore and transferred via PullObject/PushObject RPCs.

Key changes:
- Regenerate proto stubs from current .proto definitions (adds message,
  recorddict, heartbeat, run, fab protos; removes stale recordset/task)
- Implement inflate/deflate in serde.cc for bottom-up object reconstruction
  and top-down serialization with SHA-256 object IDs
- Rewrite communicator.cc to use PullMessages/PushMessages/PullObject/
  PushObject/ConfirmMessageReceived Fleet API RPCs
- Rewrite grpc_rere.cc with ECDSA node authentication and new RPC stubs
- Update typing.h with Message/RecordDict/Array types matching Python SDK
- Add Docker quickstart (Dockerfile, docker-compose.yml, pyproject.toml)
  for easy testing with SuperLink + 2 C++ clients + Python ServerApp
- Update README with Docker instructions
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ccb6830a4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +180 to +182
confirm_req.set_run_id(msg.metadata.run_id);
confirm_req.set_message_object_id(msg.metadata.message_id);
communicator->send_confirm_message_received(confirm_req, &confirm_resp);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Confirm receipt only after all message objects are available

This acknowledges the message even when object pulls were incomplete (object_found/object_available can be false earlier in receive), but ConfirmMessageReceived deletes the message object server-side. In the common race where the message arrives before all preregistered objects are uploaded, the client can confirm too early, lose the message permanently, and proceed with missing/empty content instead of retrying pulls.

Useful? React with 👍 / 👎.

Comment on lines +910 to +913
auto q3 = arr_parsed.body.find('"', q2 + 1);
auto q4 = arr_parsed.body.find('"', q3 + 1);
if (q3 != std::string::npos && q4 != std::string::npos)
stype = arr_parsed.body.substr(q3 + 1, q4 - q3 - 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse the stype value instead of the next JSON key

The quote-walking logic extracts the substring between q3 and q4, which corresponds to the next key name (for example arraychunk_ids) rather than the stype value. As a result, inflated arrays carry a corrupted stype, and downstream conversion to legacy Parameters propagates an incorrect tensor_type, which can break clients that select deserialization based on that field.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Flower C++ SDK and quickstart example to interoperate with modern Flower SuperLinks (Flower 1.27.0), including the “inflatable objects” protocol and the Fleet API RPC set.

Changes:

  • Regenerates/updates C++ protobuf and gRPC stubs for newer protocol definitions (message/recorddict/heartbeat/run/fab, updated node/transport).
  • Reworks the C++ client runtime to use Fleet API node lifecycle, message pull/push, object pull/push, and message receipt confirmation.
  • Updates the C++ quickstart example with Docker Compose-based end-to-end setup and a minimal Python app scaffold.

Reviewed changes

Copilot reviewed 40 out of 50 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
framework/cc/flwr/src/start.cc Client startup flow updated (register/activate/heartbeat thread/message loop/cleanup).
framework/cc/flwr/src/message_handler.cc Switch from TaskIns/TaskRes handling to Message/RecordDict-based handling.
framework/cc/flwr/src/grpc_rere.cc Implements Fleet RPCs and node auth metadata signing.
framework/cc/flwr/src/communicator.cc Implements PullMessages/PushMessages and inflatable object pull/push/inflate/deflate flow.
framework/cc/flwr/include/typing.h Updates SDK typing to RecordDict/Message/Metadata and new record value variants.
framework/cc/flwr/include/start.h Removes legacy transport gRPC include and aligns includes with new flow.
framework/cc/flwr/include/serde.h Adds RecordDict/Message serde APIs + inflatable object helpers.
framework/cc/flwr/include/message_handler.h Updates handler API to handle_message(Message).
framework/cc/flwr/include/grpc_rere.h Updates communicator interface to Fleet RPCs + node auth support.
framework/cc/flwr/include/flwr/proto/transport.pb.h Regenerated transport proto (adds Scalar uint64 oneof arm).
framework/cc/flwr/include/flwr/proto/transport.pb.cc Regenerated transport proto implementation for Scalar changes.
framework/cc/flwr/include/flwr/proto/run.grpc.pb.h Adds run gRPC stub header (regenerated).
framework/cc/flwr/include/flwr/proto/run.grpc.pb.cc Adds run gRPC stub implementation (regenerated).
framework/cc/flwr/include/flwr/proto/recorddict.grpc.pb.h Adds recorddict gRPC stub header (regenerated).
framework/cc/flwr/include/flwr/proto/recorddict.grpc.pb.cc Adds recorddict gRPC stub implementation (regenerated).
framework/cc/flwr/include/flwr/proto/node.pb.h Regenerated node proto (uint64 node_id, adds NodeInfo).
framework/cc/flwr/include/flwr/proto/node.pb.cc Regenerated node proto implementation (adds NodeInfo).
framework/cc/flwr/include/flwr/proto/message.grpc.pb.h Adds message gRPC stub header (regenerated).
framework/cc/flwr/include/flwr/proto/message.grpc.pb.cc Adds message gRPC stub implementation (regenerated).
framework/cc/flwr/include/flwr/proto/heartbeat.pb.h Adds heartbeat proto header (regenerated).
framework/cc/flwr/include/flwr/proto/heartbeat.pb.cc Adds heartbeat proto implementation (regenerated).
framework/cc/flwr/include/flwr/proto/heartbeat.grpc.pb.h Adds heartbeat gRPC stub header (regenerated).
framework/cc/flwr/include/flwr/proto/heartbeat.grpc.pb.cc Fixes heartbeat gRPC stub source/header includes (regenerated).
framework/cc/flwr/include/flwr/proto/fab.pb.cc Adds fab proto implementation (regenerated).
framework/cc/flwr/include/flwr/proto/fab.grpc.pb.h Adds fab gRPC stub header (regenerated).
framework/cc/flwr/include/flwr/proto/fab.grpc.pb.cc Adds fab gRPC stub implementation (regenerated).
framework/cc/flwr/include/communicator.h Updates Communicator interface to Fleet API and Message-based exchange.
framework/cc/flwr/CMakeLists.txt Adds new protos to generation and links ssl/crypto for node auth.
examples/quickstart-cpp/src/main.cc Updates CLI help text and binary name.
examples/quickstart-cpp/README.md Updates instructions (Docker Compose + local run) and binary naming.
examples/quickstart-cpp/pyproject.toml Adds Python project metadata and flwr run app configuration.
examples/quickstart-cpp/Dockerfile Adds Docker image to build C++ client + Python deps for quickstart.
examples/quickstart-cpp/docker-compose.yml Adds Compose setup for SuperLink + two C++ clients + runner.
examples/quickstart-cpp/CMakeLists.txt Updates build to link ssl/crypto and renames executable.
examples/quickstart-cpp/client.py Adds placeholder Python ClientApp to satisfy app config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""
import flwr as fl

app = fl.client.ClientApp(client_fn=lambda context: fl.client.NumPyClient().to_client())
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPyClient() is an abstract base class and cannot be instantiated; this placeholder ClientApp will crash when imported/executed. Replace this with a minimal concrete Client implementation (or a ClientApp that raises a clear error if started) so flwr run doesn’t fail when it tries to launch/validate the client component.

Suggested change
app = fl.client.ClientApp(client_fn=lambda context: fl.client.NumPyClient().to_client())
def client_fn(context) -> fl.client.Client:
"""Placeholder client factory for the C++ quickstart.
The actual clients for this example are implemented in C++ and connect to
the server via gRPC. This Python client placeholder exists only to satisfy
the Flower app configuration requirements and must not be started.
"""
raise RuntimeError(
"This Python ClientApp is a placeholder for the C++ quickstart. "
"Use the C++ SuperNode clients instead of starting this client."
)
app = fl.client.ClientApp(client_fn=client_fn)

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +178
// Confirm message received
flwr::proto::ConfirmMessageReceivedRequest confirm_req;
flwr::proto::ConfirmMessageReceivedResponse confirm_resp;
auto *confirm_node = new flwr::proto::Node();
confirm_node->set_node_id(node_id);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message receipt is confirmed unconditionally after attempting to pull/inflate objects. Since ConfirmMessageReceived deletes the message object on the server, confirming when objects are missing/unavailable (or inflation failed) can permanently drop the message and prevent retries. Only confirm after all required objects are successfully retrieved and the message inflated successfully; otherwise return nullopt/retry without confirming.

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +153
if (communicator->send_pull_object(pull_req, &pull_resp)) {
std::cerr << "[DEBUG recv] obj " << obj_id.substr(0,16)
<< " found=" << pull_resp.object_found()
<< " avail=" << pull_resp.object_available()
<< " size=" << pull_resp.object_content().size() << std::endl;
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When PullObjectResponse returns object_found=false or object_available=false, the code just skips adding the object and continues. This can lead to partial objects maps and subsequent inflation failures. Consider treating missing/unavailable objects as a hard failure for this message (don’t confirm receipt, and retry pulling later) instead of proceeding with an incomplete object set.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +36
if (msg_type == "reconnect") {
keep_going = false;
} else if (msg_type == "get_parameters") {
result_content = _get_parameters(client);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconnect handling is incomplete: on msg_type == "reconnect" the code immediately sets keep_going=false without parsing the reconnect instructions from message.content (e.g., config.seconds) and without producing the expected DisconnectRes/ACK reply content. This will prevent server-directed reconnect/backoff from working correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +41
} else if (msg_type == "train") {
if (message.content) {
result_content = _fit(client, *message.content);
}
} else if (msg_type == "evaluate") {
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For train/evaluate, missing message.content is silently ignored and results in sending an effectively empty reply. This can produce malformed responses that look successful but omit required fields. Prefer treating missing content as an error (throw or set reply.error) so invalid replies aren’t sent.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +67
BIGNUM *priv_bn = BN_bin2bn(key_bytes, sizeof(key_bytes), nullptr);
EC_KEY_set_private_key(ec_key, priv_bn);

// Derive public key from private key scalar.
EC_POINT *pub_pt = EC_POINT_new(group);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EC private key is set from raw bytes via BN_bin2bn/EC_KEY_set_private_key without reducing modulo the curve order or checking for zero/out-of-range values. This can generate invalid keys and cause intermittent authentication/signature failures. Prefer EVP/EC keygen APIs (or reduce into [1, order-1] and validate return codes).

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +40
FILE *f = fopen("/dev/urandom", "rb");
if (f) {
fread(key_bytes, 1, sizeof(key_bytes), f);
fclose(f);
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code reads entropy from /dev/urandom and includes <unistd.h>, which won’t compile/work on Windows. If cross-platform support is desired, consider using portable entropy sources (e.g., std::random_device) or adding platform-specific alternatives behind #ifdefs.

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +118
EVP_MD_CTX *md = EVP_MD_CTX_new();
EVP_DigestSignInit(md, nullptr, EVP_sha256(), nullptr, pkey_);
EVP_DigestSignUpdate(md, timestamp.data(), timestamp.size());
size_t sig_len = 0;
EVP_DigestSignFinal(md, nullptr, &sig_len);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crypto API return values are not checked (e.g., EVP_MD_CTX_new, EVP_DigestSignInit/Update/Final). If any call fails, this can lead to null dereferences or invalid signatures being sent. Please check return codes and fail the RPC deterministically with a clear error when signing fails.

Copilot uses AI. Check for mistakes.

target_include_directories(flwr PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<BUILD_INTERFACE:${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include>
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc_SOURCE_DIR is likely not defined by FetchContent_MakeAvailable(gRPC) (FetchContent sets gRPC_SOURCE_DIR/gRPC_BINARY_DIR using the declared name). As a result, this include path may be empty/incorrect and <openssl/...> headers from gRPC’s BoringSSL may not be found. Consider switching to ${gRPC_SOURCE_DIR} (or, better, rely on the ssl/crypto targets’ exported include directories).

Suggested change
$<BUILD_INTERFACE:${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include>
$<BUILD_INTERFACE:${gRPC_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include>

Copilot uses AI. Check for mistakes.

target_include_directories(flwr PUBLIC
${FLWR_INCLUDE_DIR}
${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc_SOURCE_DIR is likely not defined by FetchContent_MakeAvailable(gRPC) (FetchContent typically sets gRPC_SOURCE_DIR). If this variable is empty, the added BoringSSL include path won’t take effect and <openssl/...> includes may fail. Consider using ${gRPC_SOURCE_DIR} or inheriting include dirs from the linked ssl/crypto targets.

Suggested change
${grpc_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include
${gRPC_SOURCE_DIR}/third_party/boringssl-with-bazel/src/include

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added the Contributor Used to determine what PRs (mainly) come from external contributors. label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Contributor Used to determine what PRs (mainly) come from external contributors.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants