Bacterial Wrapper

We will create a wrapper that will be designed to run neat on a small genome from a bacteria, with multiple generations of mutations. We will need to find some example data. I'm envisioning the following steps:
1. gather some example bacteria data (NCBI, check around). We will need any data we can find, but especially fasta references files. If we can find fastq and bam data, even better.
2. Write a script that takes a bacterial reference fasta file. For each chromosome in the file, add a new chromosome where the original chromosome is split halfway through the sequence, then stitch them back together so that the old ends are now the middle: => ABBCBBC -> ABBC BBC -> BBC ABBC -> BBCABBC
     - Be careful with memory usage. 
     - Ideally, we would want this to use standard Python modules (check the environment file to see what is already available). The main thing is to avoid biopython. We can discuss any other modules to use.
     - Try calling neat with subprocess (check Keshav's code for an example of calling NEAT this way). 
3. Further improvements, as time allows:
     - Add a feature to allow a user to run multiple loops, which will output only on the last loop
     - If the process is running slow, we might try splitting the file and running multiple instances of neat (concurrently or sequentially)
     - Add a script to stitch the output files back together (Keshav may have done this part, and we just need to use that code).
     - Cross reference the synthetic variants with a gene map of the bacteria to see if there were any mutations in coding regions.
   
Keshav's scripts:
- wrapper to parallelize neat: https://github.com/ncsa/NEAT/blob/144-break-uprecombine-larger-genomes/neat/read_simulator/utils/parallelize.py
- Functions to split the inputs: https://github.com/ncsa/NEAT/blob/144-break-uprecombine-larger-genomes/neat/read_simulator/utils/split_inputs.py
- Recombine the outputs: https://github.com/ncsa/NEAT/blob/144-break-uprecombine-larger-genomes/neat/read_simulator/utils/stitch_outputs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bacterial Wrapper #222

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bacterial Wrapper #222

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions