Skip to content

Simulated data and real data workflow have diverged too far and that affects testing #114

@famulare

Description

@famulare

@tinghf alerted me that this block of code breaks on the simulated data because sample isn't a valid column.

# filter out nested PCR targets to retain high-level target only
# Flu A
keepTargetList <- unique(db$sample[db$pathogen %in% c("Flu_A_H1","Flu_A_H3")])
dropTargetList <- unique(db$sample[db$pathogen %in% c("Flu_A_pan")])
dropSampleList <- intersect(dropTargetList,keepTargetList)
db <- db %>% filter( !(sample %in% dropSampleList & db$pathogen %in% c("Flu_A_pan")))
# enterovirus
keepTargetList <- unique(db$sample[db$pathogen %in% c("EV_D68")])
dropTargetList <- unique(db$sample[db$pathogen %in% c("EV_pan")])
dropSampleList <- intersect(dropTargetList,keepTargetList)
db <- db %>% filter( !(sample %in% dropSampleList & db$pathogen %in% c("EV_pan")))

The short-term fix is to wrap this block with an if(source == 'production') as in

if(source == 'production'){

# filter out nested PCR targets to retain high-level target only
  # Flu A
  keepTargetList <- unique(db$sample[db$pathogen %in% c("Flu_A_H1","Flu_A_H3")])
  dropTargetList <- unique(db$sample[db$pathogen %in% c("Flu_A_pan")])
  
  dropSampleList <- intersect(dropTargetList,keepTargetList)
  
  db <- db %>% filter( !(sample %in% dropSampleList & db$pathogen %in% c("Flu_A_pan")))
  
  # enterovirus
  keepTargetList <- unique(db$sample[db$pathogen %in% c("EV_D68")])
  dropTargetList <- unique(db$sample[db$pathogen %in% c("EV_pan")])
  
  dropSampleList <- intersect(dropTargetList,keepTargetList)
  
  db <- db %>% filter( !(sample %in% dropSampleList & db$pathogen %in% c("EV_pan")))
}

Long term, we should keep the simulated data synchronized with the necessary test cases. You can see the workflow pattern to do that in commits to the simulated-data repo: https://github.com/seattleflu/simulated-data/commits/master.

  • introduce a script that makes a specific format change to the data without breaking other columns (unless this is on purpose!)
  • change the data
  • commit both together explaining the change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions