Summary outputs from each download cycle are fundamentally flawed if/when a state submits duplicate registrations. In this issue, "duplicate registrations" are HIP registrations that were already submitted to FWS in a previous download. For example, if in mid-October we receive 50,000 registrations from state XX, but 20,000 of those registrations were already submitted and processed in September, we have only really received 30,000 new registrations from that state.
As the process currently stands, inter-download duplicates cannot be detected or removed by the migbirdHIP package. Detection of inter-download duplicates is possible during our biweekly download cycle pre-processing if all files are read in. In this process, only new AND non-duplicate registrations would be kept and reported in output summaries. Deduplication prior to data load will improve accuracy of summary reports and the HIP registration dashboard.
Summary outputs from each download cycle are fundamentally flawed if/when a state submits duplicate registrations. In this issue, "duplicate registrations" are HIP registrations that were already submitted to FWS in a previous download. For example, if in mid-October we receive 50,000 registrations from state XX, but 20,000 of those registrations were already submitted and processed in September, we have only really received 30,000 new registrations from that state.
As the process currently stands, inter-download duplicates cannot be detected or removed by the
migbirdHIPpackage. Detection of inter-download duplicates is possible during our biweekly download cycle pre-processing if all files are read in. In this process, only new AND non-duplicate registrations would be kept and reported in output summaries. Deduplication prior to data load will improve accuracy of summary reports and the HIP registration dashboard.