-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[META] Leverage protobuf for serializing select node-to-node objects #15308
Comments
Having spent way too much time diving into the byte-level (or bit-level if you consider 7-bit Vints) details of these protocols, I want to make sure we're focusing on the correct things here. Protobuf isn't some magical thing that brings about 30% improvement the way the POC implemented it, and I don't think the existing POCs tested it fully vs. the existing capabilities of the transport protocol if used properly. In summary:
Protobuf's primary benefit is in backwards compatibility. There are probably additional significant benefits of gRPC but this meta issue doesn't seem to be about changing the entire protocol, so I don't think those should be considered as part of the benefits proposed. The performance impacts are very situation specific and shouldn't be assumed; it's possible a change in streaming to variable-length integers/longs (VInt/ZInt) can achieve similar gains. |
Hi @dbwiddis, thanks for the feedback!
I am noticing protobuf objects I implement write more bytes to stream than the native "writeTo" serialization. Small protobuf objects are about ~20% larger but the difference shrinks as request content grows. So far I've been focusing on FetchSearchResult and it's members. If i'm understanding correctly the best use case for protobuf (only considering performance) would be something like a As I learn more about protobuf I'm understanding any potential performance improvements just from making a change to serialization is gong to be speculative and I'll update this issue with benchmarks as I run them. Currently the concrete expected benefits would be out of the box backwards compatibility as well as providing a stepping stone for gRPC support. |
Initial numbers for protobuf vs native serializers. OSB Big 5 2.16 min distribution
FetchSearchResults as protobuf - Branch
Notes
|
Let's keep in mind that while node-to-node we may not see as much benefit, client-to-server switching from JSON to protobuf should. There's also a significant improvement in developer experience if we can replace the native implementation with protobuf. |
Some additional benchmarks for a vector search workload. 5 data nodes (r5.xlarge) per cluster. 10000 queries against sift-128-euclidean.hdf5 data set. FetchSearchResults as Protobuf
2.16 Native serialization
|
@finnegancarroll can we run vector search workload with client-to-server switching? Thanks. |
Please describe the end goal of this project
Provide a framework and migrate some key node-to-node requests to serialize via protobuf rather than the current native stream writing (i.e.
writeTo(StreamOutput out)
). Starting with node-to-node transport messages maintains api compatibility while introducing protobuf implementations which have several benefits:Supporting References
Issues
Related component
Search:Performance
The text was updated successfully, but these errors were encountered: