Skip to content

Commit baaad06

Browse files
committed
Adjust primary DE model for sex covariate (~ condition + gender)
The primary DESeq2 model now controls for sex as an additive covariate, addressing the main biological weakness flagged in audit: unadjusted ~ condition model despite having sex metadata available. Changes: - 01_qc.R: filter samples with NA gender before balanced sampling, update VST design formula - 03_deseq2.R: ~ condition → ~ condition + gender - 09_sensitivity_analysis.R: match model, filter NA genders from full cohort for consistency apeglm shrinkage continues to target the condition coefficient only. The sex interaction model (script 11) remains as a separate complementary analysis testing condition × gender interaction effects. Note: committed results were generated with ~ condition and need regeneration. CI pipeline-rebuild will regenerate from scratch.
1 parent 495c937 commit baaad06

4 files changed

Lines changed: 22 additions & 5 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ GSE152075 (n=484, GEO)
3333
02 PCA ─────────── VST (blind=TRUE) → PCA for sample-level exploratory analysis
3434
3535
36-
03 DE ──────────── Balanced subset (n=60) → DESeq2 (~ condition) → apeglm shrinkage
36+
03 DE ──────────── Balanced subset (n=60) → DESeq2 (~ condition + gender) → apeglm shrinkage
3737
3838
├──→ 04 Sensitivity ─── Full cohort (n=484) DE → concordance check (99.7% sign agreement)
3939
├──→ 05 Diagnostics ─── Cook's distance, dispersion, MA, volcano, scree
@@ -48,7 +48,7 @@ GSE152075 (n=484, GEO)
4848
## Methods Overview
4949

5050
- Bulk RNA-seq preprocessing and quality control
51-
- Differential expression modelling with DESeq2 plus `apeglm` log2 fold-change shrinkage
51+
- Differential expression modelling with DESeq2 (`~ condition + gender`) plus `apeglm` log2 fold-change shrinkage
5252
- Variance-stabilising transformation (VST) for visualisation
5353
- Functional enrichment analysis (GO/KEGG)
5454
- Full-cohort robustness benchmark against the balanced subset

scripts/01_qc.R

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,14 @@ if (any(duplicated(rownames(counts)))) {
6262
saveRDS(counts, "data/counts_qc.rds")
6363
saveRDS(metadata, "data/metadata_qc.rds")
6464

65+
# Filter to samples with non-NA gender for covariate-adjusted model
66+
has_gender <- !is.na(metadata$gender)
67+
if (sum(!has_gender) > 0) {
68+
message("Excluding ", sum(!has_gender), " samples with missing gender annotation")
69+
metadata <- metadata[has_gender, , drop = FALSE]
70+
counts <- counts[, rownames(metadata), drop = FALSE]
71+
}
72+
6573
set.seed(analysis_config$sample_seed)
6674
n <- analysis_config$samples_per_group
6775
pos_samples <- rownames(metadata)[metadata$condition == "positive"]
@@ -92,7 +100,7 @@ message("Final: ", sum(metadata$condition == "negative"), " neg, ",
92100
saveRDS(counts, "data/counts_clean.rds")
93101
saveRDS(metadata, "data/metadata_clean.rds")
94102

95-
dds <- DESeqDataSetFromMatrix(counts, metadata, design = ~ condition)
103+
dds <- DESeqDataSetFromMatrix(counts, metadata, design = ~ condition + gender)
96104
vsd <- varianceStabilizingTransformation(dds, blind = TRUE)
97105
saveRDS(vsd, "data/vst_data.rds")
98106

scripts/03_deseq2.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ metadata <- readRDS("data/metadata_clean.rds")
2121

2222
message("Testing ", nrow(counts), " genes across ", ncol(counts), " samples")
2323

24-
dds <- DESeqDataSetFromMatrix(counts, metadata, design = ~ condition)
24+
dds <- DESeqDataSetFromMatrix(counts, metadata, design = ~ condition + gender)
2525
dds$condition <- relevel(dds$condition, ref = "negative")
2626

2727
dds <- DESeq(dds)

scripts/09_sensitivity_analysis.R

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,15 @@ if (!requireNamespace("apeglm", quietly = TRUE)) {
2929

3030
counts_qc <- readRDS("data/counts_qc.rds")
3131
metadata_qc <- readRDS("data/metadata_qc.rds")
32+
33+
# Filter to samples with non-NA gender (matching the balanced subset model)
34+
has_gender <- !is.na(metadata_qc$gender)
35+
if (sum(!has_gender) > 0) {
36+
message("Excluding ", sum(!has_gender), " full-cohort samples with missing gender annotation")
37+
metadata_qc <- metadata_qc[has_gender, , drop = FALSE]
38+
counts_qc <- counts_qc[, rownames(metadata_qc), drop = FALSE]
39+
}
40+
3241
balanced_raw <- read.csv("results/tables/deseq2_results.csv", stringsAsFactors = FALSE)
3342
balanced_shrunk <- read.csv("results/tables/deseq2_results_shrunken.csv", stringsAsFactors = FALSE)
3443
go_df <- read.csv("results/tables/go_biological_process.csv", stringsAsFactors = FALSE)
@@ -38,7 +47,7 @@ counts_balanced <- readRDS("data/counts_clean.rds")
3847
keep_genes_full <- rowSums(cpm(counts_qc) >= analysis_config$gene_cpm_cutoff) >= analysis_config$gene_min_samples
3948
counts_full <- counts_qc[keep_genes_full, , drop = FALSE]
4049

41-
dds_full <- DESeqDataSetFromMatrix(counts_full, metadata_qc, design = ~ condition)
50+
dds_full <- DESeqDataSetFromMatrix(counts_full, metadata_qc, design = ~ condition + gender)
4251
dds_full$condition <- relevel(dds_full$condition, ref = "negative")
4352
dds_full <- DESeq(dds_full)
4453

0 commit comments

Comments
 (0)