Skip to content

Separation of BOLD genus and species in SBDI export #923

@pragermh

Description

@pragermh

Description of the bug

Hi,

Related to #921 but much simpler. Here, the specificEpithet, infraspecificEpithet, and (for non-BIN matches) scientificName fields in the SBDI export from annotation against COIDB become misaligned when genus ends with an underscore followed by one or more 'X's.

Example:
genus: Malacostraca_XXX → specificEpithet: Malacostraca, infraspecificEpithet: XXXX

I can fix this with the code below, but I assume it should be quite straightforward to handle robustly in ampliseq parsing as well.

# Flag BOLD BINs
annotation[, isBIN := grepl("^BOLD:[A-Z0-9]+$", scientificName)]
# Fix mis-split names
annotation[grepl("_[X]+$", genus), `:=`(
    specificEpithet = "X",
    infraspecificEpithet = "",
    scientificName = ifelse(!isBIN, paste0(genus, specificEpithet), scientificName)
  ) 
]

Regards,
Maria

Command used and terminal output

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions