-
Couldn't load subscription status.
- Fork 118
feat: schema diffing #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: schema diffing #1346
Conversation
654aa32 to
3af7c20
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1346 +/- ##
==========================================
+ Coverage 84.96% 85.80% +0.84%
==========================================
Files 114 115 +1
Lines 29445 31441 +1996
Branches 29445 31441 +1996
==========================================
+ Hits 25017 26977 +1960
- Misses 3220 3243 +23
- Partials 1208 1221 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
108283f to
ed148a5
Compare
| } | ||
|
|
||
| #[test] | ||
| fn test_cursed_deeply_nested_complex_types() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cursed
LMAO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add cursed cases with double nesting. Ex: array<array<>> map<array<>, array<>>, and map<, map<,>>
with structs ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small cleanup. Looks correct to me. Thank you!
Replace string-based field paths with ColumnName type throughout the schema diffing framework for better consistency with the codebase. Changes: - Updated FieldChange, FieldUpdate, and FieldWithPath to use ColumnName - Updated SchemaDiffError to use ColumnName in error messages - Refactored path construction to use ColumnName::new() and join() - Replaced string prefix matching with proper ColumnName comparison - Updated all test assertions to use ColumnName All tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Changed unwrap() to expect() with a descriptive message to satisfy clippy linter requirements while maintaining safety guarantees. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add #![allow(dead_code)] to the schema diff module since this API is not yet consumed by other parts of the codebase. The functions are fully tested and ready for use in schema evolution features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Combine duplicate test_ancestor_filtering tests into single test with two cases (add and remove) - Update test_ancestor_filtering_with_mixed_changes to verify both ancestor and descendant changes are included - Expand test_array_element_struct_field_changes to verify add, remove, and update operations - Expand test_map_value_struct_field_changes to verify add, remove, and update operations - Add test_cursed_deeply_nested_complex_types for frightening nested container scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Verify that array<array<int>> -> array<array<double>> is detected as TypeChanged on the parent field. This demonstrates the recursion limitation where we cannot descend into non-struct container types since there are no field IDs at those intermediate levels. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Renamed the function to better reflect that it classifies changes between DataType instances, making the purpose more explicit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…maps This change improves the schema diff algorithm to correctly handle nested type changes in arrays and maps by using recursion in classify_data_type_change. Key improvements: - Recursively check element types in arrays (e.g., array<array<int>> -> array<array<double>>) - Recursively check key/value types in maps - Support combined changes (type + nullability) using FieldChangeType::Multiple - Only recurse when types actually differ to avoid false positives This fixes the issue where array<array<struct<a int>>> -> array<array<struct<a int, b int>>> would be incorrectly blocked. Now nested struct changes are properly detected via field IDs while also allowing detection of primitive type changes at any nesting level. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Created check_container_nullability_change() helper to reduce code duplication - Applied DRY principle for nullability checking logic used in arrays and maps - Added recursive type change detection for nested containers - Support combined changes (type + nullability) via FieldChangeType::Multiple - All 28 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Clarified comments for map key/value type change detection - Existing code already handles recursion for arrays/maps containing structs - Example: map<array<struct<a int>>, int> -> map<array<struct<a int, b int>>, int> is correctly detected via recursive calls to classify_data_type_change - Map changes already combined via FieldChangeType::Multiple for: * Key type changes * Value type changes * Container nullability changes - All 28 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Previously, field collection only handled one level of container nesting (e.g., array<struct<...>>). This refactoring extracts the container recursion logic into a new helper function that supports arbitrary nesting depth. The new `collect_fields_from_datatype` function recursively descends through container layers (arrays and maps) to find all nested struct fields, enabling support for deeply nested schemas like: - array<array<struct<...>>> - map<array<struct<...>>, array<struct<...>>> - map<array<array<struct<a int>>>, array<struct<b int>>> This uses mutual recursion between DataType processing and StructType processing to ensure all nested structs are discovered regardless of container nesting depth. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Metadata changes can be unsafe and breaking. For example, changes to row tracking metadata are not backward compatible. To remain on the safe side, we now classify all metadata changes as breaking changes. This updates the `is_breaking_change_type` function to include `FieldChangeType::MetadataChanged` in the set of breaking change types, and updates the test expectations accordingly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
When a diff is empty, it should never have breaking changes. This adds an assertion to verify this invariant in the test_identical_schemas test. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Enhanced test robustness by adding assertions for all three field categories (added, removed, updated) instead of just checking one. This ensures tests catch unexpected changes in any category. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…ported Removed redundant assert_eq!(diff.added_fields.len(), 1) that was checking the same condition twice in the test. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added assert!(diff.has_breaking_changes()) after verifying the nested field path to ensure the breaking changes flag remains correctly set throughout the test. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Verify that rename operations (which are non-breaking) correctly report no breaking changes in the array struct element test. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Verify that adding a non-nullable struct correctly reports as a breaking change in the ancestor filtering test with mixed changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add comprehensive test cases for nested array nullability changes: - test_nested_array_nullability_loosened: outer array element nullability loosened - test_nested_array_nullability_tightened: outer array element nullability tightened - test_nested_array_inner_nullability_loosened: inner array element nullability loosened - test_nested_array_inner_nullability_tightened: inner array element nullability tightened These tests verify that nested array nullability changes are correctly detected and classified as breaking or non-breaking changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added three complex test cases to test_cursed_deeply_nested_complex_types: - Case 3: array<array<struct>> - doubly nested array with struct elements - Case 4: map<array<struct>, array<struct>> - map with arrays of structs - Case 5: map<struct, map<struct, struct>> - map with nested map and structs These tests verify field changes are correctly detected through extremely complex nested structures with proper path construction and breaking change detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updated schema diffing logic to classify field removals as safe operations rather than breaking changes. This aligns with the principle that dropping columns is safe - existing data files remain valid and queries referencing removed fields will fail at query time, but data integrity is maintained. Changes: - Modified compute_has_breaking_changes() to not flag field removals - Updated 5 test assertions that previously expected removals to be breaking - All 32 schema diff tests pass Rationale: Field removals should not break existing readers since they can safely ignore unknown fields in data files. Query-time failures for removed fields are acceptable as they reflect the intentional schema change. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Co-authored-by: OussamaSaoudi <[email protected]>
Co-authored-by: OussamaSaoudi <[email protected]>
Co-authored-by: OussamaSaoudi <[email protected]>
Performance optimizations: - Add ColumnName::parent() method for O(1) parent path access - Remove ColumnName::is_descendant_of() (no longer needed) - Optimize has_added_ancestor() from O(N*M) to O(M) using parent chain - Add early return in classify_data_type_change() for identical types Code quality improvements: - Add check_field_nullability_change() helper for better code reuse - Refactor physical name validation into validate_physical_name() - Simplify nested match statements with helper functions Test coverage: - Add Case 6: Field nullability tightening in deeply nested structures - Add Case 7: Container nullability tightening in deeply nested structures All 32 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
2a065a3 to
f6226f5
Compare
| /// The path to this field (e.g., ColumnName::new(["user", "address", "street"])) | ||
| pub path: ColumnName, | ||
| /// The type of change that occurred | ||
| pub change_type: FieldChangeType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we make this changes: Vec<FieldChangeType> and avoid the extra indirection through the FieldChangeType::Multiple variant?
What changes are proposed in this pull request?
Implements a schema diffing framework for Delta Kernel Rust to enable schema evolution. The diff algorithm uses field IDs (from column mapping) to track fields across schema versions, enabling detection of renames, additions, removals, and modifications at all nesting levels.
Java version here.
Summary of the allowed and blocked changes is TODO.
How was this change tested?
Unit tests.