AA-GWAS/
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.yml # Bug report template
│ │ └── feature_request.yml # Feature request template
│ └── workflows/
│ └── ci.yml # GitHub Actions CI/CD
│
├── .gitignore # Git ignore patterns
├── CHANGELOG.md # Version history and changes
├── CITATION.md # Citation information
├── CONTRIBUTING.md # Contribution guidelines
├── FAQ.md # Frequently asked questions
├── LICENSE # MIT License
├── README.md # Main documentation
├── QUICKSTART.md # Quick start guide
│
├── config.example # Example configuration file
├── init_repo.sh # Repository initialization script
├── install.sh # Installation script
│
├── gwas_analysis_pipeline.sh # Main pipeline (18 steps)
├── extract_individuals.sh # Individual list extraction
├── extract_top_snps.py # Python SNP extraction utility
└── generate_plots.py # Visualization generation utility
- README.md: Comprehensive project documentation with features, usage, and examples
- QUICKSTART.md: Step-by-step guide to get started in minutes
- FAQ.md: Common questions and troubleshooting
- CHANGELOG.md: Detailed version history and release notes
- CITATION.md: How to cite this work and dependencies
- CONTRIBUTING.md: Guidelines for contributors
- LICENSE: MIT License terms
-
gwas_analysis_pipeline.sh: Main 18-step GWAS analysis pipeline
- Quality control (steps 1-9)
- Population structure (steps 10-12)
- Association analysis (steps 13-15)
- Export and visualization (steps 16-18)
-
extract_individuals.sh: Standalone script to quickly extract individual lists
- Detailed list with headers
- ID-only list
- Cases, controls, males, females
-
extract_top_snps.py: Python utility for efficient SNP extraction
- Handles large files (>800K SNPs)
- Top 100, top 1000 SNPs
- Genome-wide significant (p<5e-8)
- Suggestive (p<1e-5)
-
generate_plots.py: Visualization generation
- Manhattan plots
- Q-Q plots with genomic inflation
- PCA plots
- Missingness distribution plots
- config.example: Template configuration file with all parameters
- install.sh: Automated installation and dependency checking
- init_repo.sh: Git repository initialization helper
-
.github/workflows/ci.yml: Continuous integration
- ShellCheck for bash scripts
- Python linting
- Compatibility testing (Python 3.6-3.11)
-
.github/ISSUE_TEMPLATE/: Issue templates
- Bug reports
- Feature requests
-
.gitignore: Excludes large data files and intermediate results
When you run the pipeline, it creates:
analysis_results/
├── individuals_*.txt # Individual lists
├── qc/ # Quality control results
│ ├── qc_summary.txt
│ ├── *_pca.eigenvec
│ ├── *_ibd.genome
│ └── *_qc3_hwe.{bed,bim,fam}
├── association/ # Association results
│ ├── *_assoc_noQC.*
│ ├── *_assoc_withQC.*
│ ├── *_logistic_3PCs.*
│ ├── *_logistic_10PCs.*
│ ├── top_100_snps.txt
│ ├── top_1000_snps.txt
│ ├── genome_wide_significant_snps_5e-8.txt
│ └── suggestive_snps_1e-5.txt
├── plots/ # Visualization
│ ├── manhattan_plot_*.png
│ ├── qq_plot_*.png
│ ├── pca_plot.png
│ └── missingness_plots.png
├── reports/ # Summary reports
│ └── GWAS_Analysis_Final_Report.txt
└── logs/ # SLURM/execution logs
Approximate sizes:
- Documentation: ~40 KB total
- Scripts: ~55 KB total
- Repository (no data): <100 KB
- With results (depends on dataset): 100 MB - 10 GB
- Update CHANGELOG.md for each release
- Keep README.md in sync with code changes
- Update FAQ.md as questions arise
- Review and merge ISSUE_TEMPLATE suggestions
Last Updated: December 11, 2025