Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: s3_file_to_parquet function doesn't override content encoding if provided with the override kwarg #505

Merged

Conversation

pfaraone
Copy link
Collaborator

@pfaraone pfaraone commented Mar 12, 2025

Summary

This change no longer raises ContentTypeValidationError from s3_file_to_parquet if appropriately overridden

Rationale

-Although the "Allow overriding content encoding" bugfix
was merged into DeltaCAT and released in 1.1.30, the overriding does not properly happen during the compaction session which results in in a raised ContentTypeValidationError
when the underlying file is gzip encoded - even if the override_content_encoding_for_parquet is provided.

Changes

  • Added check for override_content_encoding_for_parquet
  • Adding unit testing coverage for s3_file_to_parquet from no coverage including the default case

Impact

  • Small blast radius - this should allow for additional tables to be onboard if they are gzip encoded but overriden

Testing

  • Added unit testing coverage

Regression Risk

  • Low as the control flow is only changed if override_content_encoding_for_parquet kwarg is provided

Checklist

  • Unit tests covering the changes have been added

    • If this is a bugfix, regression tests have been added
  • E2E testing has been performed

Additional Notes

Any additional information or context relevant to this PR.

@pfaraone pfaraone added the bug Something isn't working label Mar 12, 2025
@pfaraone pfaraone changed the base branch from 2.0 to main March 12, 2025 02:27
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pfaraone pfaraone merged commit 11bbbed into main Mar 12, 2025
4 of 18 checks passed
@pfaraone pfaraone deleted the dev/s3_file_to_parquet-does-not-override-content-encoding branch March 12, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants