Handle border cases where labels absent in presence-absence datasets

Many datasets have multiple clusters/grids where field labels are present, within the borders of which the labels are presence/absence (fields are fully labeled). On their borders, however, there may still be fields that are not labeled. If chips are placed along these borders, there might be partial labeling in the mask for that chip. 

The `--drop-border-chips` flag addresses this somewhat, but only removes chips on the border of the convex hull _computed over all of the fields_ (not per-cluster/grid). This means the borders that are between clusters are inside the dataset-level convex hull, and the chips on those interior borders don't get dropped.

See for example in Estonia, which is representative of the case for European countries in FTW that have two grids that were sampled: 
<img width="1614" height="803" alt="Image" src="https://github.com/user-attachments/assets/fe34ba0d-0fb7-4ef2-8386-74912886573d" />

This is even worse for regions that have many clusters like Cambodia:
<img width="1271" height="671" alt="Image" src="https://github.com/user-attachments/assets/bef90e06-261d-4f28-ac99-021fce2fb0aa" />

We need a better solution for `--drop-border-chips` that accounts for the interior boundaries too. This is hard because there is no information in the parquet files that indicate which cluster/grid the fields are in, and we don't always know how many clusters/grids there are. 

One solution might be to use DBSCAN on the tile IDs or lat/lons, but we don't want it to be too slow. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle border cases where labels absent in presence-absence datasets #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle border cases where labels absent in presence-absence datasets #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions