Skip to content

Run inter-download deduplication for every download cycle #46

@lawalter

Description

@lawalter

Summary outputs from each download cycle are fundamentally flawed if/when a state submits duplicate registrations. In this issue, "duplicate registrations" are HIP registrations that were already submitted to FWS in a previous download. For example, if in mid-October we receive 50,000 registrations from state XX, but 20,000 of those registrations were already submitted and processed in September, we have only really received 30,000 new registrations from that state.

As the process currently stands, inter-download duplicates cannot be detected or removed by the migbirdHIP package. Detection of inter-download duplicates is possible during our biweekly download cycle pre-processing if all files are read in. In this process, only new AND non-duplicate registrations would be kept and reported in output summaries. Deduplication prior to data load will improve accuracy of summary reports and the HIP registration dashboard.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestworkflowImprovement to processing speed or methodology
No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions