Skip to content

Bacterial Wrapper #222

@joshfactorial

Description

@joshfactorial

We will create a wrapper that will be designed to run neat on a small genome from a bacteria, with multiple generations of mutations. We will need to find some example data. I'm envisioning the following steps:

  1. gather some example bacteria data (NCBI, check around). We will need any data we can find, but especially fasta references files. If we can find fastq and bam data, even better.
  2. Write a script that takes a bacterial reference fasta file. For each chromosome in the file, add a new chromosome where the original chromosome is split halfway through the sequence, then stitch them back together so that the old ends are now the middle: => ABBCBBC -> ABBC BBC -> BBC ABBC -> BBCABBC
    • Be careful with memory usage.
    • Ideally, we would want this to use standard Python modules (check the environment file to see what is already available). The main thing is to avoid biopython. We can discuss any other modules to use.
    • Try calling neat with subprocess (check Keshav's code for an example of calling NEAT this way).
  3. Further improvements, as time allows:
    • Add a feature to allow a user to run multiple loops, which will output only on the last loop
    • If the process is running slow, we might try splitting the file and running multiple instances of neat (concurrently or sequentially)
    • Add a script to stitch the output files back together (Keshav may have done this part, and we just need to use that code).
    • Cross reference the synthetic variants with a gene map of the bacteria to see if there were any mutations in coding regions.

Keshav's scripts:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions