Skip to content

Feature: add rkyv support#1669

Open
Nicholas-Ball wants to merge 3 commits intodatabendlabs:mainfrom
Nicholas-Ball:main
Open

Feature: add rkyv support#1669
Nicholas-Ball wants to merge 3 commits intodatabendlabs:mainfrom
Nicholas-Ball:main

Conversation

@Nicholas-Ball
Copy link

@Nicholas-Ball Nicholas-Ball commented Feb 28, 2026

I know this was has been a feature that has been attempted a few times so I am hoping this finally pushes it across the metaphoric finish line.

This approach is very similar to #879 to resolve #316. There are a few differences. The first is rkyv no longer uses archive(check_bytes). They now use bytecheck which is implemented automatically for structs that implement the rkyv::Archive attribute when bytecheck feature is enabled. Bytecheck feature is enabled by default.

Rkyv doesn't like enums without variants, so I manually implemented rkyv for NoForward and Infallible.

I used the serde feature as a guide of what needs to also support rkyv so it is possible I implemented rkyv on something that doesn't need it by accident but I am pretty certain and have doubled check that I didn't get anything included by accident.

I added tests and examples of how to use rkyv with openraft. These tests and examples are mostly 1-1 copies of the existing serde tests with serde replaced.

I also updated the docs to include the new feature.

Let me know if there is anything I missed.

Checklist

  • Updated guide with pertinent info (may not always apply).
  • Squash down commits to one or two logical commits which clearly describe the work you've done.
  • Unittest is a friend:)

This change is Reviewable

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution for the rkyv feature!

Usually I do not accept a totally AI generated patch. Meanwhile, even with the most advanced model, AI generated patch very likely is not qualified at production level.

There are several issues with this PR:

  • rkyv feature is not actually used in the added example.
  • lacking test to cover the added rkyv features.
  • not all of the types need rkyv support.

The following is AI generated analysis about what types need rkyv derive:

Serialization Feature Flag Design

Background

The current codebase has a single serde feature flag that gates serde::Serialize/serde::Deserialize
derives on all types uniformly. The new rkyv feature flag was added the same way —
applied to every type that had serde.

But not every type serves the same purpose. Openraft types fall into distinct categories
with different serialization needs:

  • Storage types: persisted to log store / snapshot store.
    Need both serde (for JSON/bincode storage backends) and rkyv (for zero-copy storage backends).

  • Transport types: sent between Raft nodes over the network.
    Need both serde (for JSON/protobuf/bincode transport) and rkyv (for zero-copy network transport).

  • Error types: returned from API calls and RPCs, never persisted or sent as raw bytes.
    Need serde (for JSON error responses, logging), but NOT rkyv.

  • Config/Runtime types: runtime configuration, metrics, ephemeral state.
    Need serde (for config files, metrics export), but NOT rkyv.

The serde and rkyv feature flags should be split along these boundaries:

Feature flag Storage Transport Error Config/Runtime
serde (current) yes yes yes yes
Proposed: serde-storage yes - - -
Proposed: serde-transport - yes - -
Proposed: serde-error - - yes -
Proposed: serde-config - - - yes
rkyv yes yes no no

The current serde flag could remain as a convenience that enables all serde sub-features.
The rkyv flag only applies to storage + transport types — error and config types never need rkyv.


Chapter 1: Types That Need rkyv

1.1 Storage Types

Persisted to log storage, snapshot, or state machine.
These are types appearing in RaftLogStorage / RaftStateMachine trait signatures
and their transitive field dependencies.

Type File Reason
LogId<C> log_id/mod.rs Core storage key, every log entry has one
LeaderId<C> (adv) vote/leader_id/leader_id_adv.rs Field inside LogId
LeaderId<C> (std) vote/leader_id/leader_id_std.rs Field inside LogId
CommittedLeaderId<C> vote/leader_id/leader_id_std.rs Field inside LogId (std mode)
Vote<C> vote/vote.rs Persisted by save_vote()
Entry<C> entry/entry.rs The log entry itself
EntryPayload<C> entry/payload.rs Field of Entry
Membership<C> membership/membership.rs Stored inside entries and StoredMembership
StoredMembership<C> membership/stored_membership.rs Returned by applied_state(), stored in snapshot
SnapshotMeta<C> storage/snapshot_meta.rs Snapshot metadata, persisted alongside snapshot data
SnapshotSignature<C> storage/snapshot_signature.rs Derived from SnapshotMeta, identifies snapshot
EmptyNode node.rs Node type stored inside Membership
BasicNode node.rs Node type stored inside Membership

1.2 Transport/RPC Types

Network messages sent between Raft nodes via RaftNetworkV2 trait.
rkyv enables zero-copy deserialization on the hot path (AppendEntries).

Type File Notes
VoteRequest<C> raft/message/vote.rs Election RPC
VoteResponse<C> raft/message/vote.rs Election reply
AppendEntriesRequest<C> raft/message/append_entries_request.rs Hot path. Replication RPC. Contains Vec<Entry>
AppendEntriesResponse<C> raft/message/append_entries_response.rs Replication reply
InstallSnapshotRequest<C> raft/message/install_snapshot.rs Snapshot transfer. Contains SnapshotMeta, Vec<u8>
InstallSnapshotResponse<C> raft/message/install_snapshot.rs Snapshot reply
SnapshotResponse<C> raft/message/install_snapshot.rs Alternative snapshot response
TransferLeaderRequest<C> raft/message/transfer_leader.rs Leadership transfer
StreamAppendError<C> raft/message/stream_append_error.rs Sent back over the wire during streaming
ClientWriteResponse<C> raft/message/client_write.rs Returned to clients; could be forwarded between nodes
WriteResponse<C> raft/message/write.rs Simplified version of ClientWriteResponse
SnapshotSegmentId raft_types.rs Used in snapshot streaming protocol

Chapter 2: Types That Have rkyv But Should Not

These types were given #[cfg_attr(feature = "rkyv", derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize))]
in commit addce54c but do not need it. They should have the rkyv derive removed.

Similarly, the serde derive on these types could be gated under more specific feature flags
(e.g., serde-error, serde-config) instead of the blanket serde flag,
so users who only need storage/transport serialization don't pull in serde impls for 30+ error types.

2.1 Error Types

Never persisted or sent as rkyv-serialized data.
Errors flow through local API calls and are serialized (when needed) as JSON for logging or HTTP responses.
Some errors appear as enum variants inside transport response types
(e.g., HigherVote is a variant of AppendEntriesResponse),
but those are part of the response type's own rkyv derive — the standalone error types are not serialized separately.

Action: remove rkyv derive. Consider gating serde under serde-error or similar.

Type File
Fatal<C> errors/fatal.rs
StorageError<C> errors/storage_error.rs
ErrorSubject<C> errors/storage_error.rs
ErrorVerb errors/storage_error.rs
RaftError<C, E> errors/raft_error.rs
ClientWriteError<C> errors/mod.rs
ChangeMembershipError<C> errors/mod.rs
InitializeError<C> errors/mod.rs
RPCError<C, E> errors/mod.rs
RemoteError<C, T> errors/mod.rs
NetworkError<C> errors/mod.rs
Unreachable<C> errors/mod.rs
Timeout<C> errors/mod.rs
ForwardToLeader<C> errors/mod.rs
ConflictingLogId<C> errors/conflicting_log_id.rs
HigherVote<C> errors/higher_vote.rs
RejectVote<C> errors/reject_vote.rs
LinearizableReadError<C> errors/linearizable_read_error.rs
AllowNextRevertError<C> errors/allow_next_revert_error.rs
MembershipError<C> errors/membership_error.rs
NodeNotFound<C> errors/node_not_found.rs
ReplicationClosed errors/replication_closed.rs
StreamingError<C> errors/streaming_error.rs
InstallSnapshotError errors/mod.rs
SnapshotMismatch errors/mod.rs
QuorumNotEnough<C> errors/mod.rs
InProgress<C> errors/mod.rs
LearnerNotFound<C> errors/mod.rs
NotAllowed<C> errors/mod.rs
NotInMembers<C> errors/mod.rs
EmptyMembership errors/mod.rs
Operation errors/operation.rs
NoForward errors/mod.rs
Infallible errors/mod.rs
BoxedErrorSource impls/boxed_error_source.rs

2.2 Config/Runtime/API-Input Types

Runtime state — never persisted as raft data, never sent over RPC as raw bytes.

Action: remove rkyv derive. Consider gating serde under serde-config or similar.

Type File Why no rkyv
Config config/config.rs Runtime configuration, loaded from file/env
SnapshotPolicy config/config.rs Field of Config
RaftMetrics<C> metrics/raft_metrics.rs Observable runtime metrics, never stored
RaftDataMetrics<C> metrics/raft_metrics.rs Metrics subset
RaftServerMetrics<C> metrics/raft_metrics.rs Metrics subset
ServerState core/server_state.rs Ephemeral runtime state (Leader/Follower/etc.)
ChangeMembers<C> change_members.rs API input to change_membership()
WaitError metrics/wait.rs Metrics wait error

2.3 Internal/Utility Types

These are pub(crate) or serde-specific utilities. Not part of any public storage or transport interface.

Action: remove rkyv derive.

Type File Why no rkyv
CommittedVote<C> vote/committed.rs pub(crate) internal wrapper. Not in any storage/transport trait signature.
UncommittedVote<C> vote/non_committed.rs pub(crate) internal wrapper.
Joint<...> quorum/joint.rs pub(crate). Not a serialized field of Membership — constructed at runtime from configs: Vec<BTreeSet<NodeId>>.
Compat<From, To> compat/upgrade.rs Serde-specific migration helper (#[serde(untagged)]). rkyv has a different archive format; this type is meaningless for rkyv.
RPCTypes network/rpc_type.rs Enum naming RPC types. Used in error context fields, not serialized on the wire.
SerdeInstant metrics/serde_instant.rs Only used in RaftMetrics. No stored or transport type contains it. Manual rkyv impl can be kept as convenience but is not required.

2.4 Test-Only Types

Action: remove rkyv derive.

Type File
UTConfig<N> engine/testing.rs
TickUTConfig core/tick.rs
TestNode node.rs

Summary

Category Count rkyv serde flag
Storage types 13 yes serde-storage
Transport/RPC types 12 yes serde-transport
Error types 34 no serde-error
Config/Runtime/API 8 no serde-config
Internal/Utility 6 no varies
Test-only 3 no n/a

Of the ~70 types that received rkyv derives in commit addce54c, ~25 need it (storage + transport).
The remaining ~45 should have rkyv removed (errors, config, runtime, internal, test).

The existing serde feature flag can be kept as a convenience alias that enables all sub-features.

@drmingdrmer reviewed 38 files and all commit messages, and made 2 comments.
Reviewable status: 38 of 78 files reviewed, 1 unresolved discussion (waiting on Nicholas-Ball).


examples/types-kv-rkyv/src/lib.rs line 52 at r1 (raw file):

        D = Request,
        R = Response,
);

types-kv is a crate contains request/response struct definition. to support rkyv, adding rkyv derive attributes to crate types-kv would be better, than creating a duplicated crate.

Unless there is something I missed that prohibits using rkyv in types-kv.

And this declared TypeConfig does not seem to be used anywhere.

Code quote:

openraft::declare_raft_types!(
    pub TypeConfig:
        D = Request,
        R = Response,
);

@Nicholas-Ball
Copy link
Author

I see. Would you like me to add a rkyv-storage and a rkyv-transport feature flag with a rkyv feature that enables both or do you want me just to do the rkyv feature for now

@drmingdrmer
Copy link
Member

I see. Would you like me to add a rkyv-storage and a rkyv-transport feature flag with a rkyv feature that enables both or do you want me just to do the rkyv feature for now

just rkyv-storage looks good. I am not very sure if rkyv-transport would be necessary, since rkyv is not designed for transport. and AFAIK there are a few compatibility issue with transmitting rkyv messages.

@Nicholas-Ball
Copy link
Author

@drmingdrmer I made those changes. Sorry for the delay. The example for rkyv uses tokio tcp streams which is based off of the gRPC example. I did end up feature creeping a bit and adding the rkyv-transport option. This was the result of me trying to add the tokio tcp stream based and realizing a serde based solution didn't 100% make sense in that case and just YOLOed it and went for the full implementation of transport+storage. The idea was to use tokio tcp to reflect how a setup might look under a more performance focused setup where rkyv might be used a lot. My exact implementation is...rough to say the least and I can polish more. I atleast wanted to get this back to you to get feedback. I understand that having tokio may be a bit cursed as well given this library uses futures for most things. I can switch the example to futures and a different tcp stream crate I just used tokio because I was the most familiar with it.

@Nicholas-Ball
Copy link
Author

Nicholas-Ball commented Mar 5, 2026

The issues you mentioned with rkyv being used for networking is it isn't versioned unlike protobuf. So if an end user wanted to use rkyv for networking they would need to basically restart the whole raft network if there are changes to the network protocols. I think a quick and easy fix on this end would to add a versioning abstraction for the transport data types or have a literal separation by name similar to how you do network with v1/legacy and v2 network.

However, this is might be worth the extra effort for some end users which is why I think we should just have it available. I can do a short blurb on the readme about potential versioning issues when using rkyv for transport with raft if you want.

Follow up:
I just remembered serde has the same issues with versioning so I guess rkyv would just be in the same boat with serde based implementations. Just that rkyv would be more sensitive as it is non self describing.

If we want only self describing schemas or versioned non self describing schemas we may want to specify that somewhere.

@drmingdrmer
Copy link
Member

Thanks for the update! A few thoughts:

Please keep this PR scoped to rkyv-storage only.

I'd like to keep rkyv-transport out of this PR. rkyv is a fixed memory layout format with zero schema metadata — any field addition, removal, or reordering breaks deserialization. This makes it fundamentally unsuitable for transport in a distributed system where nodes upgrade independently. The Raft protocol types may change between versions, and transport needs to tolerate that.

The comparison with serde isn't quite accurate — serde's common formats (JSON, MessagePack) are self-describing: field names travel with the payload, unknown fields can be skipped, and new optional fields can be added with #[serde(default)]. rkyv has no equivalent. They're not in the same boat.

Making rkyv work safely for transport would require version negotiation at connection time, frozen type snapshots per version, and N/N-1 support during rolling upgrades. That's significant infrastructure, not something to YOLO into a storage PR.

The example needs work.

The current rkyv example doesn't actually use any rkyv feature in the storage implementation — it's a pure copy of the in-memory storage. The example should demonstrate what rkyv brings to the table for storage (zero-copy deserialization of log entries, snapshots, etc.), otherwise it doesn't justify its existence.

For the network layer in the example, plain serde over TCP is good enough. It keeps the example simple and avoids giving users the impression that rkyv is a recommended choice for transport.

If you'd like to explore rkyv-transport in the future, let's do that as a separate PR with a proper versioning design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rkyv support

2 participants