Three days of debugging (and rewriting) leads me to write this. We should create a troubleshooting guide with some common (and uncommon) pitfalls when dealing with Gorums, protobuf, and gRPC.
Here is a first topic for such a guide:
If you observe on the client side:
2025/03/06 22:58:24 ClientHandle quorum call error: quorum call error: incomplete call (errors: 2, replies: 0)
node errors:
node 2942784576: rpc error: code = Unavailable desc = stream is down
node 2993117433: rpc error: code = Unavailable desc = stream is down
And the server is receiving the message, but fails to unmarshal the message in NodeStream.RecvMsg loop. Logging the error that causes the loop to break and shut down the stream revealed little information:
rpc error: code = Internal desc = grpc: failed to unmarshal the received message: proto: not found
Further digging into the unmarshaling logic in encoding.go revealed that this line returned an error:
desc, err := protoregistry.GlobalFiles.FindDescriptorByName(protoreflect.FullName(msg.Metadata.GetMethod()))
Hence, the method descriptor for the ClientHandle type had not been registered. Finally, once I understood this, it was easy enough to figure out that I had forgotten to generate a .pb.go file for ClientHandle, which only had a single Gorums quorumcall, importing its message types from elsewhere. I assume I was thinking that I didn't need a .pb.go file for .proto files that don't have any messages. You do!!
In conclusion: it is not enough to generate _gorums.pb.go files if your .proto file only has service methods. You need also the .pb.go files generated by protoc-gen-go. These .pb.go files take care of registring method descriptors in the GlobalFiles proto registry.
I will open a separate issue to address the poor error messages from encoding.go and the server-side issue of the NodeStream failing silently.
Three days of debugging (and rewriting) leads me to write this. We should create a troubleshooting guide with some common (and uncommon) pitfalls when dealing with Gorums, protobuf, and gRPC.
Here is a first topic for such a guide:
If you observe on the client side:
And the server is receiving the message, but fails to unmarshal the message in
NodeStream.RecvMsgloop. Logging the error that causes the loop to break and shut down the stream revealed little information:Further digging into the unmarshaling logic in
encoding.gorevealed that this line returned an error:Hence, the method descriptor for the
ClientHandletype had not been registered. Finally, once I understood this, it was easy enough to figure out that I had forgotten to generate a.pb.gofile forClientHandle, which only had a single Gorums quorumcall, importing its message types from elsewhere. I assume I was thinking that I didn't need a.pb.gofile for.protofiles that don't have any messages. You do!!In conclusion: it is not enough to generate
_gorums.pb.gofiles if your.protofile only has service methods. You need also the.pb.gofiles generated byprotoc-gen-go. These.pb.gofiles take care of registring method descriptors in theGlobalFilesproto registry.I will open a separate issue to address the poor error messages from
encoding.goand the server-side issue of theNodeStreamfailing silently.