Add deduplication step to the targets pipeline

After the conversion in the targets pipeline, we could add a step that reads in the Parquet registers and deduplicates (ignoring the `source_file` column). 

This could/should also be added to the conversion log somehow (how many rows were deduplicated; n rows before and after).