Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flatten attribute to derive SerializeRow #1144

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

nrxus
Copy link
Contributor

@nrxus nrxus commented Dec 7, 2024

This is similar to the flatten attribute in serde.

This PR adds support for both the match_by_name and the enforce_ordering flavors but it does not allow these structs to be mix-and-matched. This means that structs of different flavors of serialization cannot be flattened into one another. This is a feasibility limitation as these two methods of serialization are completely at odds with each other and hence cannot be combined. The error produced if these two flavors are matched will be at compile time but it may not be the clearest error since it would be about the struct not implementing some doc hidden trait they wouldn't be able to see in the docs.

I have only added this attribute to SerializeRow because it was easier than DeserializeRow but I also want to add it to that macro in a future PR next chance I get to dig into this code.

All the new traits/structs/enums needed for this change are inside the _macro_internal subdmodule such that no new public API is exposed. Maybe in the future those could be made public but it felt too early to know if all the signatures were exactly how we wanted to expose them or not.

For context, I am currently dealing with an issue that if I have different insert queries where one sets N columns and another one sets the same N and one extra, then I have two make two structs with N repeated fields. With this PR I'd be able to to instead flatten the struct with N fields inside the other struct to make my code more maintainable.

By name serialization

ser::row::ByName

A new internal-only struct ser::row::ByName is added that wraps a struct that implements a new trait: SerializeRowByName. This new type has a single function ser::row::ByName::serialize and attempts to serialize an entire RowSerializationContext, returning an error if any of the columns in the context were not serialized or do not belong to the struct. This is basically the implementation of SerializeRow::serialize for any struct that implements SerializeRowByName but split into its own internal-type so that the macro doesn't have to create this shared code. This couldn't be added as a default implementation in one of our traits because we need to call for some functions using Self as a generic parameter which caused some compilation errors.

SerializeRowByName

When deriving SerializeRow using the match_by_name flavor the struct will also implement a new internal-only trait: SerializeRowByName. This trait has a single type associated type Partial, and a function partial() that creates it. The partial struct has 3 main parts:

  1. For every field that is a column (will not be flattened): A reference to the field.
  2. For every field that is a nested struct (will be flattened) The partial view of that nested struct. This means that the nested struct must also implement: SerializeRowByName such that partial() can be called on it.
  3. A hashset to check that every field (not column) has been serialized. For fields that are columns we do this via their column-name. For nested structs, we do this via the field name of the nested struct.

The partial struct is required to implement a new trait PartialSerializeRowByName

PartialSerializeRowByName

PartialSerializeRowByName has two required functions:

  • serialize_field: takes the spec of a single column and attempts to serialize the corresponding field to it. If this column does not belong to this partial struct then the caller is told that the column is unused so that the caller can instead try to use a different field for this same column (i.e., when testing to see if any nested structs can serialize to that column). If the column is used, then a check is done to see if that column has completed the serialization of this field so that it can remove it out of its missing set. The caller is informed if that column has finished the serialization of this partial struct or not.

  • check_missing: consumes the partial struct while checking if all the fields in this struct were serialized, returning an error if not. This is used inside ser::row::ByName::serialize to verify that the a struct has been fully serialized. If a field has not finished serializing and the field is a nested struct (i.e., not just a column) then we should get the error from the nested struct instead for better error messaging.

To do this signaling, a new internal-only enum ser::row::FieldStatus was added that returns whether a column was used for the field, was used and completed the field, or was used by the field is still missing more columns.

By order serialization

ser::row::InOrder

A new internal-only struct ser::row::InOrder is added that wraps a struct that implements a new trait: SerializeRowInOrder. This new type has a single function ser::row::InOrder::serialize that attempts to serialize an entire RowSerializationContext, returning an error if any of the columns in the context were not serialized or do not belong to the struct. It does this by:

  1. Wrapping an iterator over the columns in the context around a new struct ser::row::ByColumn.
  2. Calling for the generated SerializeRowInOrder implementation for the struct we are deriving SerializeRow for using the ser::row::ByColumn instance.
  3. Verifying that the the ser::row::ByColumn instance was fully consumed.

This is basically the implementation of SerializeRow::serialize for any struct that implements SerializeRowInOrder but split into its own internal-type so that the macro doesn't have to create this shared code. This couldn't be added as a default implementation in one of our traits because we need to call for some functions using Self as a generic parameter which caused some compilation errors.

ser::row::ByColumn

ser::row::ByColumn wraps an iterator over column specs and provides the following methods:

  • next: Given a value to serialize it type and name checks it against the next column spec in the iterator, serializing it if successful or returning an error therwise

  • next_skip_name: Given a value to serialize it type checks (but skips name check) it against the next column spec in the iterator, serializing it if successful or returning an error therwise

  • finish: verifies that the iterator is fully consumed.

SerializeRowInOrder

When deriving SerializeRow using the enforced_ordering flavor the struct will also implement a new internal-only trait: SerializeRowInOrder. This trait has a single method serialize_in_order() whose generated implementation will:

  1. For every field that is a column (will not be flattened): It will call for for next or next_skip_name on the given ser::row::ByColum instance.
  2. For every field that is a nested struct (will be flattened), then it will call for serialize_in_order on it (implying the nested struct must also implement SerializeRowInOrder and pass along its ser::row::ByColum instance.

Note that this method does not call for finish() on ser::row::ByColumn because it does not need to verify that the iterator was fully consumed as it could have been called during flattening and we only want to verify that the iterator is consumed on the root struct being serialized.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.

Copy link

github-actions bot commented Dec 7, 2024

cargo semver-checks found no API-breaking changes in this PR.
Checked commit: 8790dcd

@nrxus nrxus force-pushed the flatten-serialize-name branch 8 times, most recently from 0de5bb0 to dc2a19e Compare December 9, 2024 04:25
@nrxus
Copy link
Contributor Author

nrxus commented Dec 9, 2024

I apologize about the multiple force pushes yesterday, I was chewing on it yesterday a bit more since it wasn't reviewed yet and decided to move some stuff around to make it more clear that all the new structs are internal to the macro implementation only (by moving it to that sub-module). All that should be done by now.

I have also started work on adding the flatten attribute for the enforce_order flavor as well but in a separate branch. Let me know if you have a preference on whether to put that as its own PR or push it here so that the entire support for #[flatten] for SerializeRow is all on in one PR.

@nrxus
Copy link
Contributor Author

nrxus commented Dec 10, 2024

I have finished the work to also support flattening when serializing with flavor = "enforced_order". It's on a separate branch though since I wasn't sure if this PR was already considered too big or not. Let me know if you'd like me to merge it into here so you can review it all as one piece or if I should just wait until this is reviewed and merged as the changes are already large.

@nrxus
Copy link
Contributor Author

nrxus commented Dec 12, 2024

@wprzytula would you mind taking a look at this PR and tell me if you would like me to keep it as-is or add the flatten support to the enforced_order flavor as well in this PR?

@nrxus nrxus force-pushed the flatten-serialize-name branch from c03bf5b to 4184a7b Compare December 13, 2024 17:18
@nrxus
Copy link
Contributor Author

nrxus commented Dec 17, 2024

@Lorak-mmk , could you take a look and let me know if there are any concerns holding this PR?

@wprzytula
Copy link
Collaborator

@nrxus We're sorry for poor responsivity on our side. We're busy with next year planning; we'll be able to look at your PR later.

@Lorak-mmk
Copy link
Collaborator

This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :(

@wprzytula
Copy link
Collaborator

This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :(

Are you sure? semver-checks disagrees with you:

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳 Checked commit: 4184a7b

@nrxus
Copy link
Contributor Author

nrxus commented Dec 18, 2024

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

@wprzytula
Copy link
Collaborator

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ?

@nrxus
Copy link
Contributor Author

nrxus commented Dec 18, 2024

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ?

Yep sounds good! I'll just keep pointing to my branch for now. I also have a branch to do this same support but when serializing with order enforced. Should I just merge it here so you all only have to review it as one complete feature? It'd make the overall size of the PR bigger which is why I had kept it separate

@wprzytula
Copy link
Collaborator

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ?

Yep sounds good! I'll just keep pointing to my branch for now. I also have a branch to do this same support but when serializing with order enforced. Should I just merge it here so you all only have to review it as one complete feature? It'd make the overall size of the PR bigger which is why I had kept it separate

IMO let's have it in a single PR, separate commits.

@Lorak-mmk
Copy link
Collaborator

This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :(

Are you sure? semver-checks disagrees with you:

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳 Checked commit: 4184a7b

I made a typo. I definitely meant that this is NOT a breaking change, and thus we will not prioritise the review before releasing 1.0.

@nrxus nrxus force-pushed the flatten-serialize-name branch 5 times, most recently from 236ab1a to 031f226 Compare December 24, 2024 19:53
@nrxus nrxus force-pushed the flatten-serialize-name branch 5 times, most recently from 670fa33 to b773680 Compare March 17, 2025 23:52
@nrxus nrxus requested a review from wprzytula March 17, 2025 23:54
@nrxus
Copy link
Contributor Author

nrxus commented Mar 25, 2025

@wprzytula are there any other changes necessary for this PR?

@wprzytula
Copy link
Collaborator

@wprzytula are there any other changes necessary for this PR?

Sorry for the delay, I've had a large number of reviews recently. I'm going to review it today or tomorrow.

@nrxus nrxus force-pushed the flatten-serialize-name branch from b773680 to 7401bb1 Compare March 27, 2025 22:59
@nrxus nrxus requested a review from wprzytula March 27, 2025 23:00
@nrxus nrxus force-pushed the flatten-serialize-name branch from 7401bb1 to d4a5e4a Compare March 31, 2025 21:17
@nrxus nrxus requested a review from wprzytula March 31, 2025 21:18
wprzytula
wprzytula previously approved these changes Apr 2, 2025
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, thanks a lot! ✨

@nrxus
Copy link
Contributor Author

nrxus commented Apr 4, 2025

@wprzytula is there anything needed for merging?

@Lorak-mmk
Copy link
Collaborator

@wprzytula is there anything needed for merging?

I'd like to also review this. I'll try to get to it shortly.

Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good! Thank you for this contribution.

The only bigger change I'd like to see is getting rid of the HashSet.
Before we used local vars for flags and counter. Those should be easily convertible to partial struct fields.
Unless there is some good reason for this change we should stick to flags and counters instead of hashset.
If there is a good reason it should be included in the commit message,

The error produced if these two flavors are matched will be at compile time but it may not be the clearest error since it would be about the struct not implementing some doc hidden trait they wouldn't be able to see in the docs.

You can use https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-diagnosticon_unimplemented-attribute to make the error message better. It requires Rust 1.78 I think, but we can bump our MSRV to this version. In fact in the other unmerged PR I bump it to 1.80, you can cherry pick the relevant commits from there: #1296

I have only added this attribute to SerializeRow because it was easier than DeserializeRow but I also want to add it to that macro in a future PR next chance I get to dig into this code.

Great! We should add it to value ser/deser too in the future, not only row.

Maybe in the future those could be made public but it felt too early to know if all the signatures were exactly how we wanted to expose them or not.

I don't see a good reason to make them public, now or in the future.

For context, I am currently dealing with an issue that if I have different insert queries where one sets N columns and another one sets the same N and one extra, then I have two make two structs with N repeated fields. With this PR I'd be able to to instead flatten the struct with N fields inside the other struct to make my code more maintainable.

That's a very reasonable use case for that feature. I did not yet read the changes, so it may already be done, but this is something that should definitely be put as an example in the docs (and maybe in the examples/ folder too).

Comment on lines +51 to +80
/// How to serialize a row column-by-column
///
/// For now this trait is an implementation detail of `#[derive(SerializeRow)]` when
/// serializing by name
pub trait PartialSerializeRowByName {
/// Tries to serialize a single column in the row according to the information in the given
/// context.
///
/// It returns whether the column finished the serialization of the struct, did it partially,
/// was not used at all, or errored.
fn serialize_field(
&mut self,
spec: &ColumnSpec,
writer: &mut RowWriter<'_>,
) -> Result<self::ser::row::FieldStatus, SerializationError>;

/// Checks if there are any missing columns to finish the serialization
fn check_missing(self) -> Result<(), SerializationError>;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ I'd just like to check if I understand this well:
check_missing will return Ok if either:

  • some previous call to serialize_field returned Ok(Done)
  • struct is empty

Otherwise it will return an error.

If that is not correct, please describe relation between the methods a bit more so that the reader can build some intuition.
If it is correct, the reasons for check_missing are:

  • Empty structs
  • Generating error messages
    because those cases are the only things not covered by remembering if Ok(Done) wes returned before.
    Am I right, or do I misunderstand the purpose of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, it is in charge of checking that there are no unfinished fields in the serialized struct after every column in RowSerializationContext has been serialized. If there are some then it generates the error message. It is also called by a "parent" struct to its flattened struct if the parent struct detected that the flattened struct was missing columns.

It is possible that perhaps we could call it only if didn't get an Ok(Done) as the last response in the partial struct if we are worried about the performance of that call but it should be pretty negligible in the happy case.

Comment on lines +148 to +218
/// Wrapper around a struct that can be serialized by name for a whole row
///
/// Implementation detail of `#[derive(SerializeRow)]` when serializing by name
pub struct ByName<'t, T: SerializeRowByName>(pub &'t T);

impl<T: SerializeRowByName> ByName<'_, T> {
#[inline]
/// Serializes all the fields/columns by name
pub fn serialize(
self,
ctx: &RowSerializationContext,
writer: &mut RowWriter<'_>,
) -> Result<(), SerializationError> {
// 1. create the partial view of the row we are serializing. A partial contains
// references to each serializable field and tracks which fields have already been
// serialized and which ones are missing
let mut partial = self.0.partial();

for spec in ctx.columns() {
// 2. For each column attempt to serialize it using the partial view
let serialized = partial.serialize_field(spec, writer)?;

// 3. If the field was not used that means the column doesn't belong to this
// struct and thus cannot be serialized. Return error.
if matches!(serialized, FieldStatus::NotUsed) {
return Err(mk_typck_err::<Self>(
BuiltinTypeCheckErrorKind::NoColumnWithName {
name: spec.name().to_owned(),
},
));
}
}

// 4. After all the fields are serialized, check that the partial doesn't have any
// fields left to serialize - return an error otherwise as we are missing columns
partial.check_missing()
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here is where my question above starts to matter. Are there other reasons besides empty structs why we call check_missing instead of remembering of Done was returned after the last column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but no.

The first call to check_missing should only happen if we didn't detect a Done, but that call to check_missing may call to flattened struct's check_missing if the field that it detected was incomplete was a flattened struct.

Comment on lines 245 to 285
let partial_struct: syn::ItemStruct = parse_quote! {
pub struct #partial_struct_name #partial_generics {
#(#fields: &#partial_lt #tys,)*
missing: ::std::collections::HashSet<&'static str>,
}
};

let serialize_field_block: syn::Block = if self.ctx.fields.is_empty() {
parse_quote! {{
::std::result::Result::Ok(#crate_path::ser::row::FieldStatus::NotUsed)
}}
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said to not worry about the hashset I was not aware that it would also be necessary for the case where flatten is not used. This introduces new overhead to the case that does not need it.

Before we used local variables (one per field) to track completion. The obvious way to migrate that to partial struct would be to have bool fields in the struct, instead of local variables. Why do we need a set instead of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first wrote this it got into my head that I really needed the hashset because I couldn't keep track of the flattened fields at compile time but I think that may have been the case for a previous implementation that ended up not going anywhere. I think the current implementation could probably just be done with one boolean per field in the struct, I'll give it a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have it now the same way as it used to be: a counter that keeps track of how many fields are remaining to serialize, and a per-field boolean to check what fields have already been visited.

nrxus added 9 commits April 7, 2025 11:14
the function itself is technically public but in the doc hidden
internal macro module so that it can be used for macros to simplify
code expansion but doesn't affect the surface area of the public API
otherwise it can't compile if the impl block already had a 'b named lifetime
This is a no-op refactor to the code generation by SerializeRow. New
types and traits are introduced in the doc hidden module
`_macro_internal` such that the API surface area is unchanged. The
goal of this refactor is to make the addition of the flattened
serialization feature easier
any helper types/methods have been created inside the internal macro
module so as not to increase the public API surface area
This commit focuses on the `match_by_name` flavor.
If a struct's field is a reference, allow that reference to use the
`flatten` attribute so that the field can be serialized using
SerializeRow and the columns produced flattened into the parent
struct.

Prior to this commit this was not possible so this commit does two
changes:

1. Adds a blanket implementation on  the needed
traits (SerializeRowByName and SerializeRowInOrder) for references of
structs that already implement them.

2. Adds a bound to the lifetimes of the fields when making the partial
structs during serialization such that the reference to the fields
have to outlive the partial struct being created during
serialization.
@nrxus nrxus force-pushed the flatten-serialize-name branch from d4a5e4a to 8790dcd Compare April 7, 2025 21:32
@nrxus nrxus requested a review from Lorak-mmk April 7, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proc-macros Related to procedural macros
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants