-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flatten
attribute to derive SerializeRow
#1144
base: main
Are you sure you want to change the base?
Conversation
|
0de5bb0
to
dc2a19e
Compare
I apologize about the multiple force pushes yesterday, I was chewing on it yesterday a bit more since it wasn't reviewed yet and decided to move some stuff around to make it more clear that all the new structs are internal to the macro implementation only (by moving it to that sub-module). All that should be done by now. I have also started work on adding the flatten attribute for the |
I have finished the work to also support flattening when serializing with |
@wprzytula would you mind taking a look at this PR and tell me if you would like me to keep it as-is or add the flatten support to the |
c03bf5b
to
4184a7b
Compare
@Lorak-mmk , could you take a look and let me know if there are any concerns holding this PR? |
@nrxus We're sorry for poor responsivity on our side. We're busy with next year planning; we'll be able to look at your PR later. |
This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :( |
Are you sure?
|
I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute. |
In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ? |
Yep sounds good! I'll just keep pointing to my branch for now. I also have a branch to do this same support but when serializing with order enforced. Should I just merge it here so you all only have to review it as one complete feature? It'd make the overall size of the PR bigger which is why I had kept it separate |
IMO let's have it in a single PR, separate commits. |
I made a typo. I definitely meant that this is NOT a breaking change, and thus we will not prioritise the review before releasing 1.0. |
236ab1a
to
031f226
Compare
670fa33
to
b773680
Compare
@wprzytula are there any other changes necessary for this PR? |
Sorry for the delay, I've had a large number of reviews recently. I'm going to review it today or tomorrow. |
b773680
to
7401bb1
Compare
7401bb1
to
d4a5e4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, thanks a lot! ✨
@wprzytula is there anything needed for merging? |
I'd like to also review this. I'll try to get to it shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good! Thank you for this contribution.
The only bigger change I'd like to see is getting rid of the HashSet.
Before we used local vars for flags and counter. Those should be easily convertible to partial struct fields.
Unless there is some good reason for this change we should stick to flags and counters instead of hashset.
If there is a good reason it should be included in the commit message,
The error produced if these two flavors are matched will be at compile time but it may not be the clearest error since it would be about the struct not implementing some doc hidden trait they wouldn't be able to see in the docs.
You can use https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-diagnosticon_unimplemented-attribute to make the error message better. It requires Rust 1.78 I think, but we can bump our MSRV to this version. In fact in the other unmerged PR I bump it to 1.80, you can cherry pick the relevant commits from there: #1296
I have only added this attribute to SerializeRow because it was easier than DeserializeRow but I also want to add it to that macro in a future PR next chance I get to dig into this code.
Great! We should add it to value ser/deser too in the future, not only row.
Maybe in the future those could be made public but it felt too early to know if all the signatures were exactly how we wanted to expose them or not.
I don't see a good reason to make them public, now or in the future.
For context, I am currently dealing with an issue that if I have different insert queries where one sets N columns and another one sets the same N and one extra, then I have two make two structs with N repeated fields. With this PR I'd be able to to instead flatten the struct with N fields inside the other struct to make my code more maintainable.
That's a very reasonable use case for that feature. I did not yet read the changes, so it may already be done, but this is something that should definitely be put as an example in the docs (and maybe in the examples/ folder too).
/// How to serialize a row column-by-column | ||
/// | ||
/// For now this trait is an implementation detail of `#[derive(SerializeRow)]` when | ||
/// serializing by name | ||
pub trait PartialSerializeRowByName { | ||
/// Tries to serialize a single column in the row according to the information in the given | ||
/// context. | ||
/// | ||
/// It returns whether the column finished the serialization of the struct, did it partially, | ||
/// was not used at all, or errored. | ||
fn serialize_field( | ||
&mut self, | ||
spec: &ColumnSpec, | ||
writer: &mut RowWriter<'_>, | ||
) -> Result<self::ser::row::FieldStatus, SerializationError>; | ||
|
||
/// Checks if there are any missing columns to finish the serialization | ||
fn check_missing(self) -> Result<(), SerializationError>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ I'd just like to check if I understand this well:
check_missing will return Ok
if either:
- some previous call to
serialize_field
returnedOk(Done)
- struct is empty
Otherwise it will return an error.
If that is not correct, please describe relation between the methods a bit more so that the reader can build some intuition.
If it is correct, the reasons for check_missing
are:
- Empty structs
- Generating error messages
because those cases are the only things not covered by remembering ifOk(Done)
wes returned before.
Am I right, or do I misunderstand the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, it is in charge of checking that there are no unfinished fields in the serialized struct after every column in RowSerializationContext
has been serialized. If there are some then it generates the error message. It is also called by a "parent" struct to its flattened struct if the parent struct detected that the flattened struct was missing columns.
It is possible that perhaps we could call it only if didn't get an Ok(Done)
as the last response in the partial struct if we are worried about the performance of that call but it should be pretty negligible in the happy case.
/// Wrapper around a struct that can be serialized by name for a whole row | ||
/// | ||
/// Implementation detail of `#[derive(SerializeRow)]` when serializing by name | ||
pub struct ByName<'t, T: SerializeRowByName>(pub &'t T); | ||
|
||
impl<T: SerializeRowByName> ByName<'_, T> { | ||
#[inline] | ||
/// Serializes all the fields/columns by name | ||
pub fn serialize( | ||
self, | ||
ctx: &RowSerializationContext, | ||
writer: &mut RowWriter<'_>, | ||
) -> Result<(), SerializationError> { | ||
// 1. create the partial view of the row we are serializing. A partial contains | ||
// references to each serializable field and tracks which fields have already been | ||
// serialized and which ones are missing | ||
let mut partial = self.0.partial(); | ||
|
||
for spec in ctx.columns() { | ||
// 2. For each column attempt to serialize it using the partial view | ||
let serialized = partial.serialize_field(spec, writer)?; | ||
|
||
// 3. If the field was not used that means the column doesn't belong to this | ||
// struct and thus cannot be serialized. Return error. | ||
if matches!(serialized, FieldStatus::NotUsed) { | ||
return Err(mk_typck_err::<Self>( | ||
BuiltinTypeCheckErrorKind::NoColumnWithName { | ||
name: spec.name().to_owned(), | ||
}, | ||
)); | ||
} | ||
} | ||
|
||
// 4. After all the fields are serialized, check that the partial doesn't have any | ||
// fields left to serialize - return an error otherwise as we are missing columns | ||
partial.check_missing() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok here is where my question above starts to matter. Are there other reasons besides empty structs why we call check_missing
instead of remembering of Done
was returned after the last column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but no.
The first call to check_missing
should only happen if we didn't detect a Done
, but that call to check_missing
may call to flattened struct's check_missing
if the field that it detected was incomplete was a flattened struct.
scylla-macros/src/serialize/row.rs
Outdated
let partial_struct: syn::ItemStruct = parse_quote! { | ||
pub struct #partial_struct_name #partial_generics { | ||
#(#fields: &#partial_lt #tys,)* | ||
missing: ::std::collections::HashSet<&'static str>, | ||
} | ||
}; | ||
|
||
let serialize_field_block: syn::Block = if self.ctx.fields.is_empty() { | ||
parse_quote! {{ | ||
::std::result::Result::Ok(#crate_path::ser::row::FieldStatus::NotUsed) | ||
}} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I said to not worry about the hashset I was not aware that it would also be necessary for the case where flatten
is not used. This introduces new overhead to the case that does not need it.
Before we used local variables (one per field) to track completion. The obvious way to migrate that to partial struct would be to have bool fields in the struct, instead of local variables. Why do we need a set instead of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first wrote this it got into my head that I really needed the hashset because I couldn't keep track of the flattened fields at compile time but I think that may have been the case for a previous implementation that ended up not going anywhere. I think the current implementation could probably just be done with one boolean per field in the struct, I'll give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have it now the same way as it used to be: a counter that keeps track of how many fields are remaining to serialize, and a per-field boolean to check what fields have already been visited.
the function itself is technically public but in the doc hidden internal macro module so that it can be used for macros to simplify code expansion but doesn't affect the surface area of the public API
otherwise it can't compile if the impl block already had a 'b named lifetime
This is a no-op refactor to the code generation by SerializeRow. New types and traits are introduced in the doc hidden module `_macro_internal` such that the API surface area is unchanged. The goal of this refactor is to make the addition of the flattened serialization feature easier
any helper types/methods have been created inside the internal macro module so as not to increase the public API surface area
This commit focuses on the `match_by_name` flavor.
If a struct's field is a reference, allow that reference to use the `flatten` attribute so that the field can be serialized using SerializeRow and the columns produced flattened into the parent struct. Prior to this commit this was not possible so this commit does two changes: 1. Adds a blanket implementation on the needed traits (SerializeRowByName and SerializeRowInOrder) for references of structs that already implement them. 2. Adds a bound to the lifetimes of the fields when making the partial structs during serialization such that the reference to the fields have to outlive the partial struct being created during serialization.
d4a5e4a
to
8790dcd
Compare
This is similar to the
flatten
attribute in serde.This PR adds support for both the
match_by_name
and theenforce_ordering
flavors but it does not allow these structs to be mix-and-matched. This means that structs of different flavors of serialization cannot be flattened into one another. This is a feasibility limitation as these two methods of serialization are completely at odds with each other and hence cannot be combined. The error produced if these two flavors are matched will be at compile time but it may not be the clearest error since it would be about the struct not implementing some doc hidden trait they wouldn't be able to see in the docs.I have only added this attribute to
SerializeRow
because it was easier thanDeserializeRow
but I also want to add it to that macro in a future PR next chance I get to dig into this code.All the new traits/structs/enums needed for this change are inside the
_macro_internal
subdmodule such that no new public API is exposed. Maybe in the future those could be made public but it felt too early to know if all the signatures were exactly how we wanted to expose them or not.For context, I am currently dealing with an issue that if I have different insert queries where one sets N columns and another one sets the same N and one extra, then I have two make two structs with N repeated fields. With this PR I'd be able to to instead flatten the struct with N fields inside the other struct to make my code more maintainable.
By name serialization
ser::row::ByName
A new internal-only struct
ser::row::ByName
is added that wraps a struct that implements a new trait:SerializeRowByName
. This new type has a single functionser::row::ByName::serialize
and attempts to serialize an entireRowSerializationContext
, returning an error if any of the columns in the context were not serialized or do not belong to the struct. This is basically the implementation ofSerializeRow::serialize
for any struct that implementsSerializeRowByName
but split into its own internal-type so that the macro doesn't have to create this shared code. This couldn't be added as a default implementation in one of our traits because we need to call for some functions usingSelf
as a generic parameter which caused some compilation errors.SerializeRowByName
When deriving
SerializeRow
using thematch_by_name
flavor the struct will also implement a new internal-only trait:SerializeRowByName
. This trait has a single type associated typePartial
, and a functionpartial()
that creates it. The partial struct has 3 main parts:SerializeRowByName
such thatpartial()
can be called on it.The partial struct is required to implement a new trait
PartialSerializeRowByName
PartialSerializeRowByName
PartialSerializeRowByName
has two required functions:serialize_field
: takes the spec of a single column and attempts to serialize the corresponding field to it. If this column does not belong to this partial struct then the caller is told that the column is unused so that the caller can instead try to use a different field for this same column (i.e., when testing to see if any nested structs can serialize to that column). If the column is used, then a check is done to see if that column has completed the serialization of this field so that it can remove it out of itsmissing
set. The caller is informed if that column has finished the serialization of this partial struct or not.check_missing
: consumes the partial struct while checking if all the fields in this struct were serialized, returning an error if not. This is used insideser::row::ByName::serialize
to verify that the a struct has been fully serialized. If a field has not finished serializing and the field is a nested struct (i.e., not just a column) then we should get the error from the nested struct instead for better error messaging.To do this signaling, a new internal-only enum
ser::row::FieldStatus
was added that returns whether a column was used for the field, was used and completed the field, or was used by the field is still missing more columns.By order serialization
ser::row::InOrder
A new internal-only struct
ser::row::InOrder
is added that wraps a struct that implements a new trait:SerializeRowInOrder
. This new type has a single functionser::row::InOrder::serialize
that attempts to serialize an entireRowSerializationContext
, returning an error if any of the columns in the context were not serialized or do not belong to the struct. It does this by:ser::row::ByColumn
.SerializeRowInOrder
implementation for the struct we are derivingSerializeRow
for using theser::row::ByColumn
instance.ser::row::ByColumn
instance was fully consumed.This is basically the implementation of
SerializeRow::serialize
for any struct that implementsSerializeRowInOrder
but split into its own internal-type so that the macro doesn't have to create this shared code. This couldn't be added as a default implementation in one of our traits because we need to call for some functions usingSelf
as a generic parameter which caused some compilation errors.ser::row::ByColumn
ser::row::ByColumn
wraps an iterator over column specs and provides the following methods:next
: Given a value to serialize it type and name checks it against the next column spec in the iterator, serializing it if successful or returning an error therwisenext_skip_name
: Given a value to serialize it type checks (but skips name check) it against the next column spec in the iterator, serializing it if successful or returning an error therwisefinish
: verifies that the iterator is fully consumed.SerializeRowInOrder
When deriving
SerializeRow
using theenforced_ordering
flavor the struct will also implement a new internal-only trait:SerializeRowInOrder
. This trait has a single methodserialize_in_order()
whose generated implementation will:next
ornext_skip_name
on the givenser::row::ByColum
instance.serialize_in_order
on it (implying the nested struct must also implementSerializeRowInOrder
and pass along itsser::row::ByColum
instance.Note that this method does not call for
finish()
onser::row::ByColumn
because it does not need to verify that the iterator was fully consumed as it could have been called during flattening and we only want to verify that the iterator is consumed on the root struct being serialized.Pre-review checklist