-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add Parquet decryption support for Hive tables #24517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is loading classes dynamically based on class names in config. We should use the standard Trino pattern of having explicitly enumerated providers, each with their own strongly typed config classes. |
Yes. I just rebased for now and resolved conflicts |
c504528
to
37d57b7
Compare
This pull request has gone a while without any activity. Tagging for triage help: @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
990f6f2
to
07025cc
Compare
EnvironmentDecryptionKeyRetriever is added as a key provider initially.
ptal @ggershinsky |
ptal @dfangs @shangxinli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for reading Hive tables with encrypted Parquet files by introducing new cryptography utilities and updating the data page structure to include a page index. Key changes include:
- New crypto classes and methods for AES GCM/CTR encryption and decryption (e.g. AesGcmEncryptor/Decryptor, AesCtrEncryptor/Decryptor).
- Introduction of a FileDecryptionContext to manage per-column decryption state and key retrieval.
- Updates to DataPage, DataPageV1, and DataPageV2 to include a new pageIndex field for tracking page order.
Reviewed Changes
Copilot reviewed 72 out of 73 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
lib/trino-parquet/src/main/java/io/trino/parquet/crypto/* | Implementation of new encryption/decryption utilities and context management for Parquet file decryption |
lib/trino-parquet/src/main/java/io/trino/parquet/* | Updated page classes now include pageIndex to support the new decryption features |
Files not reviewed (1)
- lib/trino-parquet/pom.xml: Language not supported
Comments suppressed due to low confidence (1)
lib/trino-parquet/src/main/java/io/trino/parquet/crypto/FileDecryptionContext.java:118
- Ensure comprehensive unit tests cover both scenarios—when a column is decrypted with the footer key and when a column-specific key is used—including edge cases where keys are missing or invalid.
public Optional<ColumnDecryptionContext> initializeColumnCryptoMetadata(ColumnPath path, boolean encryptedWithFooterKey, Optional<byte[]> columnKeyMetadata, int columnOrdinal)
// AES_GCM_CTR_V1 | ||
if (columnKey.isEmpty()) { | ||
// Decryptor with footer key | ||
if (aesCtrDecryptorWithFooterKey == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cached decryptor instances (aesCtrDecryptorWithFooterKey and similarly aesGcmDecryptorWithFooterKey) are stored and reused. If these decryptors are used concurrently, they may exhibit thread-safety issues; consider ensuring thread confinement or creating a new instance per decryption operation.
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
Description
Adds support to read Hive tables with encrypted Parquet files.
PS: This PR is work in progress and we are adding tests to it.
Additional context and related issues
Parquet added support for encryption https://parquet.apache.org/docs/file-format/data-pages/encryption/. Spark also added support to read and write tables with parquet encrypted files. In this PR we are adding support to read Hive tables with encrypted Parquet files with Trino.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: