-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or request
Description
input files
would be nice if we could support the following inputs
- Path objects representing paths to the files
- and files ending in gz
-
sys.stdoutandsys.stdin- this might be harder for the Genotypes class b/c I think cyvcf2 only accepts strings to paths (see Read VCF From StringIO or Buffer? brentp/cyvcf2#47)
- TextIO objects
- This definitely won't be possible for the Genotypes class but we could do it for the Phenotypes and Covariates classes?
one strategy would be to create a function in the Data abstract class that could detect each of these cases and handle them appropriately?
- we should also ensure that most of the classes can work appropriately on streams of data
- and rewrite Genotypes.read to allow it to read data line by line
informative warnings
- would also be nice if we could warn users when the regions or samples that they provided encompass zero variants
- and tell them to check that the chroms prefix matches up or attempt to fix it ourselves
- for all warnings and errors, use the Logger module instead of raising assertions?
additional classes
- for covariates (as a table of samples x covariates)
filtering of variants
- by whether they're multi-allelic
- automatically by the subset of samples contained in the intersection of the genotype and phenotype files
- note that this might be something we should only do within the code that utilizes the data module (for ex: happler)
- by MAF
subclasses for different kinds of genotyping data
or just some way to type-hint the specific kind that you need
- phased vs no restriction on phasing
- biallelic vs no restriction on allele number
- filterable for above a certain MAF (only applies to biallelic)
- contains TRs (potentially handled by trtools - see support for a TR-based GenotypesPLINK class #73)
new functions
-
iterate()- a generator function that iterates over each line bit by bit and yields named tuples where each entry is a property of the module but having values just for a single row
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request