About

Python scripts to pull DNA sequence data from a postgreSQL database, optionally aligned.

Scripts included in this repository:

psql_fastamaker.py

A python script to generate a FASTA file from sequences stored in a postgresql database. Arguments:

d --database. Name of the postgresql database. The script exepcts a '.connectstring_databasealias' file in the folder where it is run with the database name, address, port, username, and password. The string is not parsed but will be copied to connect to the database.
-m --markerset. Our postgresql database has several markersets defined (e.g., COI, HiMAP500) to pull multiple markers with a single command for the same group of taxa. Use '-l' to get a list of the available options.
-n --naming. Optional, select the naming convention as defined in the postgresql database, e.g., bold, barcodingr, pycoistats, etc. Default "classic".
-a --align. Optional, yes or no (default), when 'yes', will use MAFFT from the path to align sequences per marker. Assumes MAFFT is installed; recommended installation method is with conda.
-w --wishlist. Optional, provide a list with specimen identifiers for which sequences will be looked up. One line per identifier.

fastas_to_spreadsheet.py

This script is mainly used to add new sequences to the postgresql database by converting a fasta file to a csv and excel spreadsheet. It will include all .fas and .fasta files in its current folder. This will only work on fasta failes that have a single line per sequence. Optional arguments:

-t --trimN. Removes 'N's from the sequence when given.
-g --gaps. Removes '-'s from the sequence when given.

AMAS.py

This script is included here for convenience, it is a copy from https://github.com/marekborowiec/AMAS . It is useful for concatenating multiple alignments and creating partition files for phylogenetic analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
.tmpunaligned.fas		.tmpunaligned.fas
214species_selection_mscodes.txt		214species_selection_mscodes.txt
2295_samples_codelist.txt		2295_samples_codelist.txt
AMAS.py		AMAS.py
GBS_dor_phylogeny_2290_samples.txt		GBS_dor_phylogeny_2290_samples.txt
LICENSE		LICENSE
README.md		README.md
fastas_to_spreadsheet.py		fastas_to_spreadsheet.py
psql_fastamaker.py		psql_fastamaker.py
test.fas		test.fas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

psql_fastamaker.py

fastas_to_spreadsheet.py

AMAS.py

About

Uh oh!

Releases 1

Packages

Languages

License

cdoorenweerd/psql_fastamaker

Folders and files

Latest commit

History

Repository files navigation

About

psql_fastamaker.py

fastas_to_spreadsheet.py

AMAS.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages