Performs PCA of genotypes.
Works in two steps.
A single fasta file containing different loci, in different populations/species. Not necessarily sorted.
The ID (the line starting by >) of each sequence has to respect the following format:
`
E24_99631_p1|arabidopsis|E15|Allele_1 NNNNNNNNNNNAAAGAAGATGGCGTCGGCAGTTTCAGTATCGTTTATTGTGGTGAATATT TTGCTTCTCCTGGTTCAGGTCTTTGCTGGGAGAGACTTTTACAAAATATTGGGAGTTCCC AGAAACGCCGATTTGAAACAAATCAAGCGATCCTATCGAAAGCTGGCCAAAGAACTCCAC CCAGATAAGAACAAAGATGATCCTGAAGCAGAACAAAGATTTCAAGACTTAGGTGCTGCT ` Four different fields separated by a pipe (|), where:
- first field is the locus name (E24_99631_p1).
 - second field is the species name (arabidopsis).
 - third field is the name of the sampled diploid individual (E15).
 - fourth field is the name of the allele (two alleles per individual, named either Allele_1 or Allele_2)
 
Single python command line (popphyl2PCA.py).
Before, you need to have these python dependencies available:
- pandas
 - sklearn
 - biopython
 
python3 ~/Programmes/popPhyl_PCA/popphyl2PCA.py [name of the subdirectory created by the script where output files will be written] [name of the input fasta file]
Example:
python3 ~/Programmes/popPhyl_PCA/popphyl2PCA.py ~/Documents/PCA/testPCA ~/Programmes/popPhyl_PCA/test.fas
Can takes between 10 minutes and 2 hours, depending on the number of SNPs and individuals.
Little Shiny interface (plotPCA.R).
Before, you need to have these R dependencies available:
- shiny
 - plotly
 - tidyverse
 - shinycssloaders
 
Then, in R:
- source(~/Programmes/popPhyl_PCA/plotPCA.R)
 - shinyApp(ui=ui, server=server)
 - upload the files with coordinates (table_coord_PCA_genotypes.txt) and eigen values (table_eigen_PCA_genotypes.txt)