Skip to content

Support custom fetch function in ParquetFile.fromUrl() #846

Description

@cornhundred

Problem

ParquetFile.fromUrl() uses the global fetch() internally, which makes it impossible to read Parquet files from authenticated endpoints (e.g., AWS S3 private buckets).

Full file loading works because I control the fetch:

import { AwsClient } from 'aws4fetch';
import { readParquet } from 'parquet-wasm';

const aws = new AwsClient({
  accessKeyId,
  secretAccessKey,
  sessionToken
});

// This works - I can sign the request
const response = await aws.fetch(url);
const buffer = await response.arrayBuffer();
const table = readParquet(new Uint8Array(buffer));

Row group streaming fails because I can't sign the internal Range requests:

import { ParquetFile } from 'parquet-wasm';

// This fails with 403 - internal fetch() calls are unsigned
const pf = await ParquetFile.fromUrl(url);
const table = await pf.read({ rowGroups: [0, 1, 2] });

The main benefit of ParquetFile is efficient partial reads via Range requests, but this is unusable with authenticated endpoints.

Proposed Solution

Allow passing a custom fetch function:

const pf = await ParquetFile.fromUrl(url, {
  fetch: (url, init) => aws.fetch(url, init)
});

This would enable any authentication scheme (AWS SigV4, Bearer tokens, API keys, etc.) without parquet-wasm needing to implement them directly.

Environment

  • parquet-wasm: 0.7.1
  • Use case: Reading row groups from AWS S3 private bucket
  • Auth library: aws4fetch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions