Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Add some tests for variant fixup #12497

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

XBaith
Copy link
Contributor

@XBaith XBaith commented Mar 11, 2025

close #12473

The implementation of variant in various visitors has been basically completed in #11831. Since Spark3 does not support the variant type, this PR only adds preliminary work and unit tests.

@XBaith XBaith requested a review from sfc-gh-aixu March 14, 2025 02:46
Copy link
Contributor

@aihuaxu aihuaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase. Otherwise, LGTM.

@XBaith XBaith force-pushed the spark-variant-visitors branch from 0bc41e3 to 74c5118 Compare March 18, 2025 02:29
@github-actions github-actions bot removed the API label Mar 19, 2025
@XBaith XBaith requested a review from rdblue March 19, 2025 03:01
}

@Test
void fixupShouldCorrectlyFixVariant() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this behavior is correct. SparkFixupTypes should only fix the type when there is an explicit fix. But SparkFixupTypes doesn't have one so all types are converted to variant. I would expect this to only allow binary, which is probably what we will use to pass this value into Spark 3.5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please further explain this? Do you mean that we need to restrict VariantType to only convert with BinaryType in SparkFixupTypes? Or are you just suggesting that the name of my test should not be "Correct"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cases that are fixed up are in SparkFixupTypes:

  @Override
  protected boolean fixupPrimitive(Type.PrimitiveType type, Type source) {
    switch (type.typeId()) {
      case STRING:
        if (source.typeId() == Type.TypeID.UUID) {
          return true;
        }
        break;
      case BINARY:
        if (source.typeId() == Type.TypeID.FIXED) {
          return true;
        }
        break;
      case TIMESTAMP:
        if (source.typeId() == Type.TypeID.TIMESTAMP) {
          return true;
        }
        break;
      default:
    }
    return false;
  }

Currently, the behavior is to always use the type from schema and never override with the reference schema's type. The test here validates that the type is not fixed, but what we care about are cases where a type does get fixed.

I think the case where a type does get fixed is when we are exposing variant as binary. So I would expect an update to SparkFixupTypes and the base FixupTypes to override binary with variant when variant is in the reference schema. Otherwise, the variant type is never "fixed" and we don't really need a new test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flink/Spark: add visitor support for variant
4 participants