Skip to content

improvements to the Data module #19

@aryarm

Description

@aryarm

input files

would be nice if we could support the following inputs

  • Path objects representing paths to the files
    • and files ending in gz
  • sys.stdout and sys.stdin
  • TextIO objects
    • This definitely won't be possible for the Genotypes class but we could do it for the Phenotypes and Covariates classes?

one strategy would be to create a function in the Data abstract class that could detect each of these cases and handle them appropriately?

  • we should also ensure that most of the classes can work appropriately on streams of data
    • and rewrite Genotypes.read to allow it to read data line by line

informative warnings

  • would also be nice if we could warn users when the regions or samples that they provided encompass zero variants
    • and tell them to check that the chroms prefix matches up or attempt to fix it ourselves
  • for all warnings and errors, use the Logger module instead of raising assertions?

additional classes

  • for covariates (as a table of samples x covariates)

filtering of variants

  • by whether they're multi-allelic
  • automatically by the subset of samples contained in the intersection of the genotype and phenotype files
    • note that this might be something we should only do within the code that utilizes the data module (for ex: happler)
  • by MAF

subclasses for different kinds of genotyping data

or just some way to type-hint the specific kind that you need

  • phased vs no restriction on phasing
  • biallelic vs no restriction on allele number
    • filterable for above a certain MAF (only applies to biallelic)
  • contains TRs (potentially handled by trtools - see support for a TR-based GenotypesPLINK class #73)

new functions

  • iterate() - a generator function that iterates over each line bit by bit and yields named tuples where each entry is a property of the module but having values just for a single row

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions