Skip to content

Add deduplication step to the targets pipeline #335

@signekb

Description

@signekb

After the conversion in the targets pipeline, we could add a step that reads in the Parquet registers and deduplicates (ignoring the source_file column).

This could/should also be added to the conversion log somehow (how many rows were deduplicated; n rows before and after).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    To do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions