In Parquet, the logical type DECIMAL can be used to annotate the following types:
int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning
fixed_len_byte_array: precision is limited by the array size. Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits
byte_array: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.
This is correctly handle in Presto's Decoders class when the data uses encoding plain.
However, it is not considered when the enconding is encoding == RLE_DICTIONARY || encoding == PLAIN_DICTIONARY.
Therefore, in such scenario reading the data causes an error:
SQL Error [65536]: Query failed (#20260623_150757_00489_8szn4): class com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder cannot be cast to class com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder (com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder and com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @4d682552)
Your Environment
- Presto version used: 297
- Storage (HDFS/S3/GCS..): S3
- Data source and connector used: hive
- Deployment (Cloud or On-prem): Cloud
Expected Behavior
The query should work
Current Behavior
The query fails with a ClassCastException
Possible Solution
Complete the case encoding == RLE_DICTIONARY || encoding == PLAIN_DICTIONARY in Decoders considering the decimal type:
case BINARY: {
if (isDecimalType(columnDescriptor)) {
if (isShortDecimalType(columnDescriptor)) {
return new ShortDecimalRLEDictionaryValuesDecoder(bitWidth, inputStream, (BinaryBatchDictionary) dictionary);
}
return new LongDecimalRLEDictionaryValuesDecoder(bitWidth, inputStream, (BinaryBatchDictionary) dictionary);
}
return new BinaryRLEDictionaryValuesDecoder(bitWidth, inputStream, (BinaryBatchDictionary) dictionary);
}
Steps to Reproduce
- Parquet file with the following column with type Binary + logical type decimal + dictionary enabled:
id_number: OPTIONAL BINARY L:DECIMAL(25,0) R:0 D:1
CREATE TABLE s3.test_schema.testbinary18309386 (
"id_number" DECIMAL(25,0)
)
WITH ( external_location = 's3a://acme/testbinary'
,format = 'PARQUET'
)
SELECT * FROM s3.test_schema.testbinary18309386
Error:
SQL Error [65536]: Query failed (#20260623_150757_00489_8szn4): class com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder cannot be cast to class com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder (com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder and com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @4d682552)
Screenshots (if appropriate)
Context
This kind of data is generated using an Informatica Data Engineering (DEI) dynamic mapping and therefore that data is not currently readable from Presto
In Parquet, the logical type DECIMAL can be used to annotate the following types:
This is correctly handle in Presto's Decoders class when the data uses encoding plain.
However, it is not considered when the enconding is encoding == RLE_DICTIONARY || encoding == PLAIN_DICTIONARY.
Therefore, in such scenario reading the data causes an error:
SQL Error [65536]: Query failed (#20260623_150757_00489_8szn4): class com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder cannot be cast to class com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder (com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder and com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @4d682552)Your Environment
Expected Behavior
The query should work
Current Behavior
The query fails with a ClassCastException
Possible Solution
Complete the case encoding == RLE_DICTIONARY || encoding == PLAIN_DICTIONARY in Decoders considering the decimal type:
Steps to Reproduce
id_number: OPTIONAL BINARY L:DECIMAL(25,0) R:0 D:1SELECT * FROM s3.test_schema.testbinary18309386Error:
SQL Error [65536]: Query failed (#20260623_150757_00489_8szn4): class com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder cannot be cast to class com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder (com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder and com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @4d682552)Screenshots (if appropriate)
Context
This kind of data is generated using an Informatica Data Engineering (DEI) dynamic mapping and therefore that data is not currently readable from Presto