Skip to content

Conversation

@uros7251brick
Copy link

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

If we have a table with an integer column file_name and run a query like:

select *
from file_metadata_test
where _metadata.file_name = 'part-00000-fab85aae-5302-49c9-8e4d-747da36e5ae9.c000.zstd.parquet'

we get an error:

[[CAST_INVALID_INPUT](https://docs.databricks.com/error-messages/error-classes.html#cast_invalid_input)] The value 'part-00000-fab85aae-5302-49c9-8e4d-747da36e5ae9.c000.zstd.parquet' of the type "STRING" cannot be cast to "BIGINT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018

This happens because FileSourceStrategy#rebindFileSourceMetadataAttributesInFilters flattens _metadata struct, leaving just file_name, and then DataSkippingReader tries to use min-max stats of the actual file_name column to prune files and fails when it tries to cast the right-hand side of the equality predicate.

What changes were proposed in this pull request?

Fix the bug by not using dataFilters involving _metadata fields for data skipping.

How was this patch tested?

I added a new test case to DataSkippingDeltaTests.scala.

Does this PR introduce any user-facing changes?

No.

@uros7251brick uros7251brick changed the title Skip _metadata struct fields on data skipping Skip _metadata struct fields on data skipping Oct 15, 2025
Copy link
Contributor

@chirag-s-db chirag-s-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, please also get a review from @tomvanbussel to make sure there aren't any potential DML implications

// 3. involve file metadata struct fields
val (ineligibleFilters, eligibleFilters) = filters.partition {
case f => containsSubquery(f) || !f.deterministic || f.exists {
case MetadataAttributeWithLogicalName(_, _) => true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why not just MetadataAttribute(_) if you're not intending to do anything with the name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, i'll change it.

@uros7251brick uros7251brick force-pushed the metadata-field-data-skipping-bug branch from 07762ef to 6c7b027 Compare October 18, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants