High memory consumption when reading big files

Used version: 1.8.6

I have a snappy parquet file which is ~60MB has ~1.000.000 rows and ~300 columns. I use following code snippet to read data.

```    
const reader = await parquet.ParquetReader.openFile('.file.snappy.parquet)
const cursor = reader.getCursor()
while (await cursor.next()) {}
```

If memory limit is 10GB I got heap allocation error and if it is ~23GB(I put the limit ) it takes ~4 minutes to process.
I tried to set maxSpan and maxLength options too to fix the issue or have a batch processing effect but it didn't work.

I can read the same file in ~30 seconds and ~170MB memory with python pandas using following code snippet
```
pf = pq.ParquetFile("file.snappy.parquet")

for batch in pf.iter_batches(batch_size=1000):
    df = pa.Table.from_batches([batch]).to_pandas()
    for row in df.itertuples(index=False):
        pass
```
I am not sure whether this kind of a bug or I am incorrectly using this library.
Could you please share your comments?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High memory consumption when reading big files #173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High memory consumption when reading big files #173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions