-
Notifications
You must be signed in to change notification settings - Fork 209
Description
Is your feature request related to a problem?
With the addition of the transport-grpc module OpenSearch now supports a gRPC/protobuf alternative to the REST API which some users may want to take advantage of. While long term this endpoint expects to keep parity with the REST API, support is currently still limited and slowly expanding.
To enable users looking to onboard to gRPC/protobuf, opensearchpy
may want to provide out of the box support for gRPC/protobuf.
What solution would you like?
There are a few different approaches for providing python client support which come to mind.
1. Leave opensearch-protobufs
as a standalone library.
opensearch-protobufs
is the python library compiled directly from the protobuf schema found here and published to PyPI here. This library provides python types corresponding to the strongly typed schema described in the .proto files of the opensearch-protobufs repo. A simple example python usage is given in the opensearch-protobuf repo here.
As gRPC emulates the structure of a regular function call in code with well defined types, it may be redundant to include it alongside other python clients as the usage is quite similar.
Configurations for the gRPC connection on the server is provide with the grpc package. Some examples for configuring ssl settings are listed here.
If there is little functionality covered by the opensearchpy
client which is not available in opensearch-protobufs
, and the usage is similar, it may be most appropriate as a standalone library.
2. Add opensearch-protobufs
as an unused dependency of opensearchpy
.
To enable opensearchpy
users to take advantage of new protobuf/gRPC endpoints when updating their opensearchpy
version this package could be included within the python client, but remain unused/unintegrated with high level constructs.
Adding to setup.py
:
# License: Apache 2.0
# gRPC & proto deps
"opensearch-protobufs==0.19.0"
This solution could add visibility for the new gRPC endpoint and avoid the minor hurdle or requiring users to add/vet a new package to their project. It could additionally make sense to publish the packages together as the gRPC/protobuf API support is limited and the REST API will be needed for operations which are not yet implemented.
3. Allow users to send their existing high/low level client requests only by enabling gRPC with a flag.
Alternatively we could integrate fully with the opensearchpy
low level client such that they only have to pass an additional flag to turn their request into a gRPC request. One possible downside in this approach is the potential overhead of serializing request fields into traditional opensearchpy format, and then back into gRPC/protobuf format on each request.
Consider an example bulk index operation from the low level client docs:
movies = '{ "index" : { "_index" : "my-dsl-index", "_id" : "2" } } \n { "title" : "Interstellar", "director" : "Christopher Nolan", "year" : "2014"} \n ... '
client.bulk(body=movies)
Protobuf builds this request according to the schema defined here compiled into python:
proto_request = document_pb2.BulkRequest()
proto_request.index = "my-dsl-index"
doc = { "title" : "Interstellar", "director" : "Christopher Nolan", "year" : "2014"}
request_body = document_pb2.BulkRequestBody()
request_body.id = 2
request_body.object = doc.encode('utf-8')
request_body.operation_container = document_pb2.IndexOperation()
proto_request.request_body.append(requestBody)
... append additional requests for each document
grpc_client.Bulk(proto_request)
In the case of the REST client we identify fields with "index", "_index", and "_id" and these fields identifiers as well as the fields themselves are serialized into JSON and UTF8 encoded. In contrast protobuf identifies fields in code (request_body.id = 2 instead of "_id"). Over the wire the request is encoded into a binary format with fields delineated only by the bytes necessary to uniquely identify that field and the fields themselves benefit from binary encoding. For more on how protobuf is encoded see: https://protobuf.dev/programming-guides/encoding/
So to allow users to simply pass a binary flag to their existing REST requests such as:
movies = '{ "index" : { "_index" : "my-dsl-index", "_id" : "2" } } \n { "title" : "Interstellar", "director" : "Christopher Nolan", "year" : "2014"} \n ... '
client.bulk(body=movies, grpc=True)
Client applications would then need to:
- Build the above "movies" python dictionary and provide it to the client for bulk ingestion.
- The python client would deserialize this python dictionary and re-serialize into protobuf representation.
- Send a protobuf request and receive a protobuf response over gRPC.
- Deserialize the binary encoded protobuf response back into a python dictionary bulk response.
This could potentially impact performance as we need to translate the low level client interface to and from protobuf for each request/response.
Do you have any additional context?
Additional examples of gRPC/protobuf client usage can be found in the documentation here:
(Python docs are not published but as seen above will look very similar)
https://docs.opensearch.org/latest/api-reference/grpc-apis/bulk/#java-grpc-client-example