Skip to content

Conversation

@gotocoding-DB
Copy link
Contributor

@gotocoding-DB gotocoding-DB commented Oct 13, 2025

What changes are proposed in this pull request?

Align the schema of AddAction with the schema in Scala implementation of Delta: https://github.com/delta-io/delta/blob/11873fe40f6fe7660b63b2c7611df40330865620/kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/AddFile.java#L59-L61

To achieve this, we need to support Option<HashMap<K, V>> in allow_null_container_values macros.

How was this change tested?

We had a test where we check the expected types. I changed the expected value for tags' values from "required" to "nullable".

@gotocoding-DB gotocoding-DB changed the title feat: Accept **nullable** values in "tags" HashMap in AddAction feat: Accept *nullable* values in "tags" HashMap in AddAction Oct 13, 2025
@gotocoding-DB gotocoding-DB marked this pull request as ready for review October 13, 2025 11:05
@gotocoding-DB gotocoding-DB changed the title feat: Accept *nullable* values in "tags" HashMap in AddAction feat: Accept nullable values in "tags" HashMap in AddAction Oct 13, 2025
@gotocoding-DB gotocoding-DB changed the title feat: Accept nullable values in "tags" HashMap in AddAction feat: Accept nullable values in "tags" HashMap in Add action Oct 13, 2025
@gotocoding-DB
Copy link
Contributor Author

Please take a look at this PR: @nicklan, @zachschuermann, and @scovich (I don't have the rights to add you as reviewers).

@scovich
Copy link
Collaborator

scovich commented Oct 13, 2025

It seems like we need some unit tests for this change? Negative tests can be tricky (maybe best avoided), but we should at least have positive tests that the newly allowed types and schemas are actually allowed and behaving as expected?

@gotocoding-DB gotocoding-DB force-pushed the feat/nullable-values-in-tags branch from eac45aa to 0e80a85 Compare October 14, 2025 13:59
@gotocoding-DB gotocoding-DB force-pushed the feat/nullable-values-in-tags branch from 96639ad to f3d0287 Compare October 14, 2025 15:46
@gotocoding-DB
Copy link
Contributor Author

@scovich I uploaded a completely new version that doesn't modify derive-macros at all because I realized the initial approach was incorrect. We need to read NULL values and keep them in "tags," but with #[allow_null_container_values], we lose them after parsing the JSON. Please review the test I've added; I tried to be very detailed and include some kind of contract there.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new version of the change looks reasonable... but we will also need to update the row visitors because they don't support nullable map keys today.

@nicklan nicklan requested a review from DrakeLin October 14, 2025 23:16
@gotocoding-DB
Copy link
Contributor Author

@scovich So few more findings after your comment.

but we will also need to update the row visitors because they don't support nullable map keys today

That's right, but we don't read tags at all: https://github.com/delta-io/delta-kernel-rs/blob/main/kernel/src/actions/visitors.rs#L120.
So, changing the schema of AddAction shouldn't break anything in RowVisitor?

Second thing. I also realized that other actions also have "tags":

}

#[test]
fn test_add_tags_deserialization() {
Copy link
Collaborator

@DrakeLin DrakeLin Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would separate these cases into 3 unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll do it

@gotocoding-DB
Copy link
Contributor Author

@scovich Hey Ryan, can you please take a look at my response above, please: #1395 (comment)?

If I'm missing something, or if changing the tags' schema doesn't make sense without supporting them in RowVisitor, I can also try adding parsing tags into RowVisitor. I just don't want to merge everything into a single PR if it can be two separate parts, but if you think it's better to combine both parts into one PR, I will do that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants