-
Notifications
You must be signed in to change notification settings - Fork 472
New protocol and generator options for asynchronous mutating traversal of messages #1762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I'm really not sure this is a good idea. In general having async traversal of a message type strikes me as being a code smell: can you elaborate on why this is the right solution?
Yeah I can give you a practical example. I have a library that talks with a gRPC service. Many messages to that service contain byte fields. For security purposes those byte fields can be encoded or encrypted. Since those byte fields can be arbitrarily nested it is easier and safer to implement this with a visitor than to hard code it for every single message type. Can you elaborate why you would consider this a code smell? I personally think this is a classic "what color is your function problem?" Of course most folks should use the synchronous visitor APIs where possible but it feels overly restrictive to not allow asynchronous visiting. |
cf84018
to
688179f
Compare
62ff381
to
35c49f9
Compare
…l of messages ## Motivation Our current `Visitor` protocol and generated `traverse` methods allow synchronous throwing traversal of the message fields. This works great for how our current serializers are working; however, often one wants to implement validators or transformers that mutate the fields of message. Sometimes this mutation is even asynchronous. ## Modification This PR adds a few things: - A new `AsyncVisitor` protocol that allows to asynchronously traverse and mutate the fields of a message - A code generator option to generate the new async traverse method - Tests that validate that mutating async traversal is working ## Result We can now implement async visitors that mutate message fields.
35c49f9
to
8fbe023
Compare
The sense in which it feels like a code smell is that it has taken a domain modelling problem and pushed it down into the serialization layer, rather than letting the serialization layer handle serialization and having your application use separate data model objects. One of my pieces of supporting evidence here is that this visitor differs from the other not only in having |
I get your point but I would argue the current visitor pattern is overly focused on just catering for serialization needs whereas a visitor pattern is something generally useful. I agree that if you implement serialization with the visitor pattern this should be both non-mutating and synchronous. However, there are many more useful things to do with visitors such as applying common transformations to a whole message hierarchy. I don't think we should over index on what the current visitor pattern since that is incredibly tailored to serialization. I tried to implement all of this outside of this library but to properly support it I need code generation and it felt right to me to expand the current visitor pattern with asynchronous and mutating visitors. |
I'm also somewhat hesitant to add new large swaths of user-facing API until we have a better idea of what a more performant swift-protobuf in the future would look like. I've been desperately waiting for the const/const-initialized proposals in the language to make progress, because one of my dreams is to replace the existing generated serialization code with something that's entirely table-driven and handled internally by the runtime (something that is infeasible with the language/compiler today). In that world, and if we start looking at taking advantage of other features that have landed in the language recently, the visitor pattern as it's designed today may not hold its weight, or we may wish to design it a different way. For example, the visitor pattern also falls flat for other tasks; we can't implement equality with it because it can't traverse the fields of two messages simultaneously and compare state between them. Most other language's proto implementations don't have a similar pattern to our visitors; they usually use descriptors and reflection instead. It's strictly a better and more general API, because it gives the caller the flexibility to do whatever they need, instead of tying them to a special notion of what it means to traverse a message. We've generally avoided generating descriptors in Swift (and Objective-C) because of code size concerns for mobile users since they're not typically used. But this change would also increase the amount of generated code for each message, so if we're considering changes that would do that and we're committing to new APIs, we should probably consider those like descriptors that serve a lot more use cases. |
I understand the desire to do this and I would love for us to explore this but realistically this requires a new major of I would like us to take a step back and discuss the thing I'm trying to achieve here. I need a way to go through each field of a message and make sure any recursively nested field of a certain message type is transformed. This transformation is asynchronous in nature. Can we all agree that this is something sensible for a user to do?
Since we don't have proper runtime reflection APIs in the language the only option that I saw was the visitor approach. @allevato do you think with descriptors we could achieve the same?
I just want to note that these new APIs are not generated unless the generator option is turned on. I would suggest we do the same if we want to generate the descriptors so we leave the option to the users. |
Even with it requiring a major version bump giving us freedom to break things, we still have to consider that with every new API, we either have to (1) drop it, requiring folks to do a possibly painful migration if it has become load-bearing for them, or (2) continue to support it, which may not be ideal, straightforward, or performant for a new underlying implementation. So I'm not suggesting that we should have a blanket ban on new features because of that, but we do need to carefully weigh which new features we take in. An async visitor feels a bit like it's on the quite specialized end of the spectrum, especially if there are other options to explore.
My first question would be, do you actually need to be able to do this for any arbitrary message/field, or is the affected surface area small enough that you could just manually implement the algorithm you're describing using fixed fields/messages? That would definitely perform better than a generic approach. If that's not possible, then descriptors and a reflection API built around them would allow you to do this. A common form this API takes (in C++, for example) is to iterate over the fields in the descriptor and use a reflection API to get the ones you're interested in (by checking what the field number or field type is, then using the relevant reflection accessor). So you could implement your logic in a simple
I think historically we've tried to keep generator options that impact presence of public API as the exception rather than the rule, because of the possibility that different subprojects could generate protos with different options and thus be incompatible for certain operations. That's not to say that we haven't considered different flavors of this, though. One thing we've been talking about for a while is generating the text/JSON functionality into separate source files so that they could be dropped by clients who don't need them. If we were able to define a compact enough descriptor format, then perhaps a better way to slice things would be for the baseline to be "no descriptors, no reflection, binary serialization only" for the ultra-light performance-sensitive cases, and then adding descriptors gets you everything else built on top of those. |
Yes for my particular use-case this is infeasible. The API I'm working with has 100s of messages defined and making sure the right fields are properly encrypted/decrypted without a generic approach is prone for missing something.
That sounds amazing for languages that provide reflection APIs; however, Swift does not. So without reflection how do we expect users to solve this problem? In general, I'm sympathetic to all the points you brought up here and open to take any other approach but I don't see any with the current language features. The only way around this that I see is implement my own protoc generator plugin which would massively blow up the scope of what I'm trying to achieve here. |
I mean that we would provide our own reflection API specifically for protos, not that we'd use language-based reflection. For example, C++ provides APIs on messages that take descriptors and read or mutate the corresponding field. |
That's interesting. Thanks for sharing. This looks like a really useful API but also way more complex to implement. I have to look more into this. |
I'm definitely more enthusiastic about a descriptors-based approach for this, as it feels much closer to the "right" shape. grpc-swift added out-of-tree support for the grpc reflection service, which is also built on descriptor functionality, so it's clear there's a generalised ask. We just need a way to generate it for the use-cases that Franz has. |
After having read through what other language implementations do I agree that ProtoDescriptors and a ProtoReflection API are the way to go. This requires a lot more new APIs and code generator changes so I'm going to close this PR for now and file an issue for the feature. |
Motivation
Our current
Visitor
protocol and generatedtraverse
methods allow synchronous throwing traversal of the message fields. This works great for how our current serializers are working; however, often one wants to implement validators or transformers that mutate the fields of message. Sometimes this mutation is even asynchronous.Modification
This PR adds a few things:
AsyncVisitor
protocol that allows to asynchronously traverse and mutate the fields of a messageResult
We can now implement async visitors that mutate message fields.