Skip to content

Add intensive testing of nested-pandas cloud IO #104

@hombit

Description

@hombit

Feature request

It would be really nice to test (and maybe benchmark) edge cases for nested_pandas.read_parquet, for example, reading recursively from an S3 "directory tree." I've found that IPAC's Euclid Q1 catalog is really good for that because it has leaf directories and a lot of smaller files. See, for example, this directory s3://nasa-irsa-euclid-q1/contributed/q1/merged_objects/hats/euclid_q1_merged_objects-hats/dataset/Norder=7/.

Related to and blocked by lincc-frameworks/nested-pandas#393

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions