-
Notifications
You must be signed in to change notification settings - Fork 29
Description
We're using parquet-cli (brew install parquet-cli) to read files that created with this lib, but we're running in to issues with either errors or empty values for fields with repeated: true and/or type: 'LIST'. Reading using ParquetReader.openFile from this lib works fine though!
Steps to reproduce
Example 1 - repeated: true
Using the following schema and code, based on this README example
const schema = new ParquetSchema({
id: { type: 'UTF8' },
stock: {
repeated: true,
fields: {
price: { type: 'DOUBLE' },
quantity: { type: 'INT64' },
},
},
});
const writer = await ParquetWriter.openFile(
schema,
'repeated-example.parquet'
);
await writer.appendRow({
id: 'Row1',
stock: [
{ price: 100, quantity: 10 },
{ price: 200, quantity: 20 },
],
});Example 2 - type: 'LIST'
Using the following schema and code, based on the tests for array list
const schema = new ParquetSchema({
id: { type: 'UTF8' },
test: {
type: 'LIST',
fields: {
list: {
repeated: true,
fields: {
element: {
type: 'UTF8',
},
},
},
},
},
});
const writer = await ParquetWriter.openFile(schema, 'list-example.parquet');
await writer.appendRow({
id: 'Row1',
test: { list: [{ element: 'abcdef' }, { element: 'fedcba' }] },
});- Generate files using the examples above
- Read these files with parquet-cli using
parquet cat <path-to-file>.
Expected behaviour
Example 1
Being able to read the file without errors.
Example 2
The result having { list: [ { element: 'abcdef' }, { element: 'fedcba' } ] } in the test field, like when reading the file using ParquetReader.openFile.
Actual behaviour
Example 1
An error is thrown, see under Error logs
Example 2
Getting the result {"id": "Row1", "test": null}
Error logs
From Example 1
Unknown error
java.lang.RuntimeException: Failed on record 0 in <omitted>/output-basic.parquet
at org.apache.parquet.cli.commands.ScanCommand.run(ScanCommand.java:75)
at org.apache.parquet.cli.Main.run(Main.java:163)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.parquet.cli.Main.main(Main.java:191)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:<omitted>/output-basic.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:280)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:140)
at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:356)
at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:337)
at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:335)
at org.apache.parquet.cli.commands.ScanCommand.run(ScanCommand.java:70)
... 3 more
Caused by: org.apache.parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: required group stock (LIST) {
repeated group array {
required double price;
required int64 quantity;
}
} != repeated group stock {
required double price;
required int64 quantity;
}
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:104)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
at org.apache.parquet.schema.MessageType.accept(MessageType.java:52)
at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:167)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:155)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:245)
... 9 more