Skip to content

Error reading a column of time Decimal if the data is in a Parquet file using binary type and dictionary encoding #28070

Description

@psantos-denodo

In Parquet, the logical type DECIMAL can be used to annotate the following types:

int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning
fixed_len_byte_array: precision is limited by the array size. Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits
byte_array: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.

This is correctly handle in Presto's Decoders class when the data uses encoding plain.
However, it is not considered when the enconding is encoding == RLE_DICTIONARY || encoding == PLAIN_DICTIONARY.

Therefore, in such scenario reading the data causes an error:
SQL Error [65536]: Query failed (#20260623_150757_00489_8szn4): class com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder cannot be cast to class com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder (com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder and com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @4d682552)

Your Environment

  • Presto version used: 297
  • Storage (HDFS/S3/GCS..): S3
  • Data source and connector used: hive
  • Deployment (Cloud or On-prem): Cloud

Expected Behavior

The query should work

Current Behavior

The query fails with a ClassCastException

Possible Solution

Complete the case encoding == RLE_DICTIONARY || encoding == PLAIN_DICTIONARY in Decoders considering the decimal type:

case BINARY: {
    if (isDecimalType(columnDescriptor)) {
        if (isShortDecimalType(columnDescriptor)) {
            return new ShortDecimalRLEDictionaryValuesDecoder(bitWidth, inputStream, (BinaryBatchDictionary) dictionary);
        }
        return new LongDecimalRLEDictionaryValuesDecoder(bitWidth, inputStream, (BinaryBatchDictionary) dictionary);
    }
    return new BinaryRLEDictionaryValuesDecoder(bitWidth, inputStream, (BinaryBatchDictionary) dictionary);
}

Steps to Reproduce

  • Parquet file with the following column with type Binary + logical type decimal + dictionary enabled:

id_number: OPTIONAL BINARY L:DECIMAL(25,0) R:0 D:1

  • Table in Presto:
CREATE TABLE s3.test_schema.testbinary18309386 (
	"id_number" DECIMAL(25,0)
	)
	WITH ( external_location = 's3a://acme/testbinary'
			,format = 'PARQUET'
			)
  • Query:

SELECT * FROM s3.test_schema.testbinary18309386

Error:
SQL Error [65536]: Query failed (#20260623_150757_00489_8szn4): class com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder cannot be cast to class com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder (com.facebook.presto.parquet.batchreader.decoders.rle.BinaryRLEDictionaryValuesDecoder and com.facebook.presto.parquet.batchreader.decoders.ValuesDecoder$LongDecimalValuesDecoder are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @4d682552)

Screenshots (if appropriate)

Context

This kind of data is generated using an Informatica Data Engineering (DEI) dynamic mapping and therefore that data is not currently readable from Presto

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions