-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Spark: Add some tests for variant fixup #12497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
api/src/main/java/org/apache/iceberg/variants/VariantMetadata.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestSparkFixupTypes.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestSparkFixupTypes.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestSparkFixupTypes.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase. Otherwise, LGTM.
0bc41e3
to
74c5118
Compare
api/src/test/java/org/apache/iceberg/types/TestReadabilityChecks.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkFixupTypes.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/data/RandomData.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/PruneColumnsWithoutReordering.java
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkFixupTypes.java
Outdated
Show resolved
Hide resolved
…thod and enhance variant fixup tests
} | ||
|
||
@Test | ||
void fixupShouldCorrectlyFixVariant() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this behavior is correct. SparkFixupTypes
should only fix the type when there is an explicit fix. But SparkFixupTypes
doesn't have one so all types are converted to variant. I would expect this to only allow binary, which is probably what we will use to pass this value into Spark 3.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please further explain this? Do you mean that we need to restrict VariantType
to only convert with BinaryType
in SparkFixupTypes
? Or are you just suggesting that the name of my test should not be "Correct"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cases that are fixed up are in SparkFixupTypes
:
@Override
protected boolean fixupPrimitive(Type.PrimitiveType type, Type source) {
switch (type.typeId()) {
case STRING:
if (source.typeId() == Type.TypeID.UUID) {
return true;
}
break;
case BINARY:
if (source.typeId() == Type.TypeID.FIXED) {
return true;
}
break;
case TIMESTAMP:
if (source.typeId() == Type.TypeID.TIMESTAMP) {
return true;
}
break;
default:
}
return false;
}
Currently, the behavior is to always use the type from schema and never override with the reference schema's type. The test here validates that the type is not fixed, but what we care about are cases where a type does get fixed.
I think the case where a type does get fixed is when we are exposing variant as binary. So I would expect an update to SparkFixupTypes
and the base FixupTypes
to override binary with variant when variant is in the reference schema. Otherwise, the variant type is never "fixed" and we don't really need a new test suite.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
close #12473
The implementation of variant in various visitors has been basically completed in #11831. Since Spark3 does not support the variant type, this PR only adds preliminary work and unit tests.