Skip to content

Commit 5615cb2

Browse files
committed
DONE !!
1 parent 12cc966 commit 5615cb2

File tree

1 file changed

+212
-0
lines changed

1 file changed

+212
-0
lines changed

README.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,213 @@
11
# BOT599
2+
3+
## Pipeline
4+
5+
fasta - clean - split - feature - merge - svm
6+
7+
### 1. Get those FASTA File
8+
9+
Files:
10+
<pre>
11+
.
12+
├── README.md
13+
├── data
14+
│   ├── <b>3prime.fasta</b>
15+
│   ├── <b>5prime.fasta</b>
16+
│   └── <b>human_exon.fasta</b>
17+
├── feature_calculator.py
18+
├── file_splitter.py
19+
├── merge_feature_file.py
20+
└── process_fasta.py
21+
</pre>
22+
23+
### 2. Clean'em up
24+
25+
Usage: `python process_fasta.py <fasta_file>`
26+
27+
Example:
28+
<pre>
29+
python process_fasta.py ./data/3prime.fasta
30+
python process_fasta.py ./data/5prime.fasta
31+
</pre>
32+
33+
<pre>
34+
.
35+
├── README.md
36+
├── data
37+
│   ├── 3prime.fasta
38+
│   ├── <b>3prime.fasta.clean</b>
39+
│   ├── 5prime.fasta
40+
│   ├── <b>5prime.fasta.clean</b>
41+
│   └── human_exon.fasta
42+
├── feature_calculator.py
43+
├── file_splitter.py
44+
├── merge_feature_file.py
45+
└── process_fasta.py
46+
</pre>
47+
48+
### 3. Files are too big apparantly, split'em
49+
50+
Usage: `python file_splitter.py <clean_fasta_file> <pieces>`
51+
52+
Example:
53+
<pre>
54+
python file_splitter.py data/3prime.fasta.clean 5
55+
python file_splitter.py data/5prime.fasta.clean 5
56+
</pre>
57+
58+
Files:
59+
<pre>
60+
.
61+
├── README.md
62+
├── data
63+
│   ├── 3prime.fasta
64+
│   ├── 3prime.fasta.clean
65+
│   ├── 3prime.fasta.clean.splits
66+
│   │   ├── <b>part1</b>
67+
│   │   ├── <b>part2</b>
68+
│   │   ├── <b>part3</b>
69+
│   │   ├── <b>part4</b>
70+
│   │   └── <b>part5</b>
71+
│   ├── 5prime.fasta
72+
│   ├── 5prime.fasta.clean
73+
│   ├── 5prime.fasta.clean.splits
74+
│   │   ├── <b>part1</b>
75+
│   │   ├── <b>part2</b>
76+
│   │   ├── <b>part3</b>
77+
│   │   ├── <b>part4</b>
78+
│   │   └── <b>part5</b>
79+
│   └── human_exon.fasta
80+
├── feature_calculator.py
81+
├── file_splitter.py
82+
├── merge_feature_file.py
83+
└── process_fasta.py
84+
</pre>
85+
86+
### 4. Calculate those magic numbers
87+
88+
All the features are defined in `feature_calculator.py`, if you don't need something, comment it.
89+
90+
<pre>
91+
...
92+
93+
# features
94+
feature_calculator.gc_content()
95+
feature_calculator.tataaa_box_present()
96+
feature_calculator.gc_box()
97+
feature_calculator.poly_a_tail()
98+
feature_calculator.stop_codon_present()
99+
feature_calculator.sequence_length()
100+
101+
# save features to file
102+
feature_calculator.save_features(feature_calculator.feature_columns())
103+
104+
...
105+
</pre>
106+
107+
#### A. To run on a single file without the Sun Grid Engine
108+
109+
Usage: `python feature_calculator.py <clean_fasta_file> [output_file]`
110+
111+
#### B. To process with the Sun Grid Engine
112+
113+
Usage: `qsub ./sun_grid_engine/job.q <input> <output>`
114+
115+
Example:
116+
<pre>
117+
mkdir data/class1
118+
mkdir data/class2
119+
qsub ./sun_grid_engine/job.q data/5prime.fasta.clean.splits/part data/class1/feature
120+
qsub ./sun_grid_engine/job.q data/3prime.fasta.clean.splits/part data/class2/feature
121+
</pre>
122+
123+
<pre>
124+
.
125+
├── README.md
126+
├── data
127+
│   ├── 3prime.fasta
128+
│   ├── 3prime.fasta.clean
129+
│   ├── 3prime.fasta.clean.splits
130+
│   │   ├── part1
131+
│   │   ├── part2
132+
│   │   ├── part3
133+
│   │   ├── part4
134+
│   │   └── part5
135+
│   ├── 5prime.fasta
136+
│   ├── 5prime.fasta.clean
137+
│   ├── 5prime.fasta.clean.splits
138+
│   │   ├── part1
139+
│   │   ├── part2
140+
│   │   ├── part3
141+
│   │   ├── part4
142+
│   │   └── part5
143+
│  ├── class1
144+
│   │   ├── <b>feature1</b>
145+
│   │   ├── <b>feature2</b>
146+
│   │   ├── <b>feature3</b>
147+
│   │   ├── <b>feature4</b>
148+
│   │   └── <b>feature5</b>
149+
│  ├── class2
150+
│   │   ├── <b>feature1</b>
151+
│   │   ├── <b>feature2</b>
152+
│   │   ├── <b>feature3</b>
153+
│   │   ├── <b>feature4</b>
154+
│   │   └── <b>feature5</b>
155+
│   └── human_exon.fasta
156+
├── feature_calculator.py
157+
├── file_splitter.py
158+
├── merge_feature_file.py
159+
└── process_fasta.py
160+
</pre>
161+
162+
### 5. One Big File
163+
164+
Usage: `python merge_features.py <directory> <output_file>`
165+
166+
Example:
167+
<pre>
168+
python merge_feature_file.py data/class1 data/class1_features
169+
python merge_feature_file.py data/class2 data/class2_features
170+
</pre>
171+
172+
<pre>
173+
.
174+
├── README.md
175+
├── data
176+
│   ├── 3prime.fasta
177+
│   ├── 3prime.fasta.clean
178+
│   ├── 3prime.fasta.clean.splits
179+
│   │   ├── part1
180+
│   │   ├── part2
181+
│   │   ├── part3
182+
│   │   ├── part4
183+
│   │   └── part5
184+
│   ├── 5prime.fasta
185+
│   ├── 5prime.fasta.clean
186+
│   ├── 5prime.fasta.clean.splits
187+
│   │   ├── part1
188+
│   │   ├── part2
189+
│   │   ├── part3
190+
│   │   ├── part4
191+
│   │   └── part5
192+
│  ├── class1
193+
│   │   ├── feature1
194+
│   │   ├── feature2
195+
│   │   ├── feature3
196+
│   │   ├── feature4
197+
│   │   └── feature5
198+
│   ├── <b>class1_features</b>
199+
│  ├── class2
200+
│   │   ├── feature1
201+
│   │   ├── feature2
202+
│   │   ├── feature3
203+
│   │   ├── feature4
204+
│   │   └── feature5
205+
│   ├── <b>class2_features</b>
206+
│   └── human_exon.fasta
207+
├── feature_calculator.py
208+
├── file_splitter.py
209+
├── merge_feature_file.py
210+
└── process_fasta.py
211+
</pre>
212+
213+
### 6. Run'em through SVM

0 commit comments

Comments
 (0)