Possible optimizations for margin generation pipeline

**Feature request**

There is a room for optimization of the amount of reads and memory usage we are doing on both mapping and reducing stages of the margin generation pipeline.

For the mapping stage, we can do the same tricks we are doing in LSDB for spatial searches: (1) filter by MOC (some partitions would just have no matches, especially in low-density catalogs), (2) pass `filters` to the parquet reader to select on `_healpix_29` (even filtering based on order 8-10 healpix would help).

For the reducing stage we can do the same tricks, plus we can query for exact `_healpix_29` values, because we kinda know what objects we need after we looked once at the mapping stage. If we have too much objects to look for, this could turn to be a bottleneck, but we can just set a threshold for the number of objects to filter for.

**Before submitting**
Please check the following:

- [x] I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
- [ ] I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
- [ ] If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible optimizations for margin generation pipeline #688

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Possible optimizations for margin generation pipeline #688

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions