This repository holds my final project code for UCSD BIOL 40236. I completed this work as part of the course and wrote all code included here.
I built this pipeline to annotate proteins using multiple sources of biological evidence. The code builds on my experience with Linux workflows, database interaction, and Python parsing, while extending those skills through a bioinformatics application completed during the course.
Scripts (sp_bioinformatics_final_project.py) Python scripts I wrote to parse inputs, query the database, and combine evidence into final annotations.
Data (hmmscan.htab, prodigal2fasta.nostars.faa, prodigal2fasta.nostars.tmhmm.short, annot_final.sql) Input files required to run my pipeline.
Output (protein_evidence.txt) Final annotation results produced by my code.
I thank Professor Orvis for his inspiration and for teaching the core material used in this project.