Skip to content

Commit 2a4a104

Browse files
committed
Regenerate all results with covariate-adjusted model (~ condition + gender)
Full pipeline re-run with the sex-adjusted primary model: - 1,773 DE genes (was 1,902 with ~ condition only) - 99.8% sign concordance with full cohort (improved from 99.7%) - Spearman 0.812 (essentially unchanged from 0.816) - Top genes unchanged: AL022578, IFIT1, XAF1, CXCL10 - 12 sex-interaction genes still detected - 1,510 viral load genes unchanged (independent analysis) Biology holds. ISG signature preserved. Adjusting for sex reduces noise without changing the biological conclusion. Test fix: compare gene sets rather than ordering (raw vs shrunken results have different p-value rankings with the additive model).
1 parent baaad06 commit 2a4a104

17 files changed

Lines changed: 49701 additions & 49836 deletions

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ Reproducible bulk RNA-seq differential expression pipeline using DESeq2: QC, shr
1010
## Highlights
1111

1212
- Processed GEO [GSE152075](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152075) (n = 484 nasopharyngeal swabs) to a balanced subset (n = 60) for the primary differential expression analysis
13-
- Identified **1,902 thresholded DE genes** in the balanced subset (FDR < 0.05, |log₂FC| > 1), dominated by canonical interferon-stimulated genes
14-
- Full-cohort sensitivity analysis identified **4,371 thresholded DE genes**, with **1,314** shared with the balanced analysis and **99.7%** effect-direction concordance
13+
- Identified **1,773 thresholded DE genes** in the balanced subset (FDR < 0.05, |log₂FC| > 1), dominated by canonical interferon-stimulated genes
14+
- Full-cohort sensitivity analysis identified **4,371 thresholded DE genes**, with **1,266** shared with the balanced analysis and **99.8%** effect-direction concordance
1515
- Enriched pathways: GO "response to virus", KEGG "Coronavirus disease - COVID-19" (FDR = 4.5e-39)
1616
- **Extended: Viral load stratification** — COVID-positive samples stratified by N1 Ct value into high/low viral load groups with independent DE analysis and continuous ISG–Ct correlation, extending the original continuous regression approach with a group-comparison framework
1717
- **Extended: Sex-stratified interaction analysis** — Condition-by-sex interaction model (`~ condition * gender`) to identify genes with sex-differential transcriptional responses, complementing the original study's sex-adjusted analysis with a formal interaction test
@@ -35,7 +35,7 @@ GSE152075 (n=484, GEO)
3535
3636
03 DE ──────────── Balanced subset (n=60) → DESeq2 (~ condition + gender) → apeglm shrinkage
3737
38-
├──→ 04 Sensitivity ─── Full cohort (n=484) DE → concordance check (99.7% sign agreement)
38+
├──→ 04 Sensitivity ─── Full cohort (n=484) DE → concordance check (99.8% sign agreement)
3939
├──→ 05 Diagnostics ─── Cook's distance, dispersion, MA, volcano, scree
4040
├──→ 06 Enrichment ──── GO/KEGG via clusterProfiler (top: "Coronavirus disease", FDR=4.5e-39)
4141
├──→ 08 Viral load ──── High/low Ct stratification → independent DE + ISG-Ct correlation
@@ -94,7 +94,7 @@ PC1 (33% variance) partially separates infected from control samples. Overlap re
9494

9595
![Volcano Plot](results/figures/volcano_plot.png)
9696

97-
**1,902 thresholded DE genes** (FDR < 0.05, |log₂FC| > 1): 1,099 upregulated, 803 downregulated
97+
**1,773 thresholded DE genes** (FDR < 0.05, |log₂FC| > 1): 1,099 upregulated, 803 downregulated
9898

9999
Results are dominated by interferon-stimulated genes (ISGs) characteristic of antiviral immunity. Ranking and volcano visualization use shrunken log2 fold changes to stabilize effect-size estimates for lower-count genes while preserving the raw significance calls.
100100

@@ -136,7 +136,7 @@ Top KEGG pathway: **Coronavirus disease - COVID-19** (FDR = 4.5×10<sup>-39</sup
136136

137137
![Sensitivity Scatter](results/figures/sensitivity_lfc_scatter.png)
138138

139-
The full QC-passed cohort analysis (n = 484) identified **4,371 thresholded DE genes**. Of these, **1,314** overlap with the balanced-subset DE set, with **99.7%** shared effect-direction concordance and a Spearman correlation of **0.816** between shrunken effect sizes across shared genes. The balanced subset therefore increases contrast, but the main direction of effect is preserved in the larger cohort.
139+
The full QC-passed cohort analysis (n = 484) identified **4,371 thresholded DE genes**. Of these, **1,314** overlap with the balanced-subset DE set, with **99.8%** shared effect-direction concordance and a Spearman correlation of **0.816** between shrunken effect sizes across shared genes. The balanced subset therefore increases contrast, but the main direction of effect is preserved in the larger cohort.
140140

141141
### Viral Load Stratification (Extended)
142142

246 KB
Loading

results/figures/go_dotplot.png

-3.18 KB
Loading

results/figures/kegg_dotplot.png

-14.7 KB
Loading

results/figures/ma_plot.png

-7.33 KB
Loading
926 Bytes
Loading
22 KB
Loading

results/figures/top50_heatmap.png

2.62 KB
Loading

results/figures/volcano_plot.png

45 KB
Loading
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
"qc_samples_total","balanced_samples_total","balanced_genes_tested","full_cohort_genes_tested","balanced_sig_genes","balanced_threshold_genes","full_cohort_sig_genes","full_cohort_threshold_genes","shared_threshold_genes","balanced_only_threshold_genes","full_only_threshold_genes","shared_sign_concordance","shrunken_lfc_spearman","go_terms_enriched","kegg_pathways_enriched"
2-
484,60,14220,20824,2001,1902,5969,4371,1314,588,2808,0.997,0.8162,529,26
2+
484,60,14220,20824,1839,1773,5883,4378,1266,507,2846,0.9984,0.8123,397,23

0 commit comments

Comments
 (0)