Problem
ParquetFile.fromUrl() uses the global fetch() internally, which makes it impossible to read Parquet files from authenticated endpoints (e.g., AWS S3 private buckets).
Full file loading works because I control the fetch:
import { AwsClient } from 'aws4fetch';
import { readParquet } from 'parquet-wasm';
const aws = new AwsClient({
accessKeyId,
secretAccessKey,
sessionToken
});
// This works - I can sign the request
const response = await aws.fetch(url);
const buffer = await response.arrayBuffer();
const table = readParquet(new Uint8Array(buffer));
Row group streaming fails because I can't sign the internal Range requests:
import { ParquetFile } from 'parquet-wasm';
// This fails with 403 - internal fetch() calls are unsigned
const pf = await ParquetFile.fromUrl(url);
const table = await pf.read({ rowGroups: [0, 1, 2] });
The main benefit of ParquetFile is efficient partial reads via Range requests, but this is unusable with authenticated endpoints.
Proposed Solution
Allow passing a custom fetch function:
const pf = await ParquetFile.fromUrl(url, {
fetch: (url, init) => aws.fetch(url, init)
});
This would enable any authentication scheme (AWS SigV4, Bearer tokens, API keys, etc.) without parquet-wasm needing to implement them directly.
Environment
- parquet-wasm: 0.7.1
- Use case: Reading row groups from AWS S3 private bucket
- Auth library: aws4fetch
Problem
ParquetFile.fromUrl()uses the globalfetch()internally, which makes it impossible to read Parquet files from authenticated endpoints (e.g., AWS S3 private buckets).Full file loading works because I control the fetch:
Row group streaming fails because I can't sign the internal Range requests:
The main benefit of
ParquetFileis efficient partial reads via Range requests, but this is unusable with authenticated endpoints.Proposed Solution
Allow passing a custom fetch function:
This would enable any authentication scheme (AWS SigV4, Bearer tokens, API keys, etc.) without parquet-wasm needing to implement them directly.
Environment