Skip to content

Impossible to create a new parquet file? #179

@codegain

Description

@codegain

Hi, after hours of trying I decided to open an issue here.

I'm trying to use this library to convert data from JSON -> parquet. This is my minimalized code:

import {ParquetSchema, ParquetWriter} from "@dsnp/parquetjs";

const schema = new ParquetSchema({
    some: {type: 'UTF8'},
    test: {type: 'UTF8'},
});

async function run() {
    const writer = await ParquetWriter.openFile(schema, 'test.parquet');
    await writer.appendRow({
        some: 'data',
        test: 'this'
    });

    await writer.close();
}

run();

I saved the file as test.ts and run it via npx tsx .\test.ts on my windows 11 machine with node v22.
It only gives me this output:

> npx tsx .\test.ts 

node_modules\thrift\lib\nodejs\lib\thrift\compact_protocol.js:553
    throw new Thrift.TProtocolException(Thrift.TProtocolExceptionType.INVALID_DATA, "Expected Int64 or Number, found: " + l);
          ^
TProtocolException: Expected Int64 or Number, found: 0
    at TCompactProtocol.i64ToZigzag (node_modules\thrift\lib\nodejs\lib\thrift\compact_protocol.js:553:11)
    at TCompactProtocol.writeI64 (node_modules\thrift\lib\nodejs\lib\thrift\compact_protocol.js:365:27)
    at Statistics.write (node_modules\@dsnp\parquetjs\dist\gen-nodejs\parquet_types.js:192:16)
    at DataPageHeaderV2.write (node_modules\@dsnp\parquetjs\dist\gen-nodejs\parquet_types.js:1730:25)
    at PageHeader.write (node_modules\@dsnp\parquetjs\dist\gen-nodejs\parquet_types.js:2239:34)
    at Object.serializeThrift (node_modules\@dsnp\parquetjs\dist\lib\util.js:85:9)
    at encodeDataPageV2 (node_modules\@dsnp\parquetjs\dist\lib\writer.js:520:40)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async encodePages (node_modules\@dsnp\parquetjs\dist\lib\writer.js:415:20)
    at async ParquetWriter.close (node_modules\@dsnp\parquetjs\dist\lib\writer.js:151:17) {
  type: 1
}

The file test.parquet gets created, but it only creates the string PAR1 and nothing else.

Am I doing something wrong? Is the library not supposed to create new parquet files and only to append to existing ones? Or is this a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions