Skip to content

Commit c9a8304

Browse files
authored
Merge pull request #13 from OpenOmics/dev
Adding in a check to the Seurat CITE QC to handle situations where no ADT or HTO features are detected in data even when assays are present. Adding in support for 2024 reference for ATAC and Multiome pipelines Add check of libraries.csv file to see if samples have more than one modality associated Update documentation Add code to generate some QC plots for RPL and RPS genes for future update
2 parents 6fbe7e6 + ecd98ee commit c9a8304

File tree

9 files changed

+74
-20
lines changed

9 files changed

+74
-20
lines changed

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.0.4
1+
3.0.5

cell-seek

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@ def parsed_arguments(name, description):
368368
genome of the samples. {2} does comes bundled with
369369
prebuilt reference files for human and mouse samples.
370370
The newest 10X reference (2024-A) is available for
371-
GEX, CITE, and Multi analysis (hg2024, mm2024).
371+
GEX, CITE, ATAC, Multi, and Multiome analysis (hg2024, mm2024).
372372
A custom reference genome can also be provided. For
373373
prebuilt references, please select one of the following
374374
options: hg38, mm10, hg2024, mm2024.

config/genome.json

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,16 @@
1717
"hg2024": {
1818
"gex_transcriptome": "/data/OpenOmics/references/cell-seek/human/refdata-gex-GRCh38-2024-A",
1919
"cite_transcriptome": "/data/OpenOmics/references/cell-seek/human/refdata-gex-GRCh38-2024-A",
20-
"vdj_ref": "/data/OpenOmics/references/cell-seek/human/refdata-cellranger-vdj-GRCh38-alts-ensembl-7.1.0/"
20+
"atac_ref": "/data/OpenOmics/references/cell-seek/human/refdata-cellranger-arc-GRCh38-2024-A",
21+
"vdj_ref": "/data/OpenOmics/references/cell-seek/human/refdata-cellranger-vdj-GRCh38-alts-ensembl-7.1.0/",
22+
"arc_ref": "/data/OpenOmics/references/cell-seek/human/refdata-cellranger-arc-GRCh38-2024-A"
2123
},
2224
"mm2024": {
2325
"gex_transcriptome": "/data/OpenOmics/references/cell-seek/mouse/refdata-gex-GRCm39-2024-A",
2426
"cite_transcriptome": "/data/OpenOmics/references/cell-seek/mouse/refdata-gex-GRCm39-2024-A",
25-
"vdj_ref": "/data/OpenOmics/references/cell-seek/mouse/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/"
27+
"atac_ref": "/data/OpenOmics/references/cell-seek/mouse/refdata-cellranger-arc-GRCm39-2024-A",
28+
"vdj_ref": "/data/OpenOmics/references/cell-seek/mouse/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/",
29+
"arc_ref": "/data/OpenOmics/references/cell-seek/mouse/refdata-cellranger-arc-GRCm39-2024-A"
2630
}
2731
}
2832
}

docs/usage/genome.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ This part of the documentation describes options and concepts for <code>cell-see
77

88
If a reference genome that does not come with the pipeline, then a custom json file needs to be provided to run.
99

10+
This command does not help with creating the 10x compatible reference itself, that would need to be done separately. 10x documentation about the process can be found for [GEX](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-3p-references), [VDJ](https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-5p-references), [ATAC](https://www.10xgenomics.com/support/software/cell-ranger-atac/latest/analysis/inputs/creating-a-reference-package-mkref), and [Multiome](https://www.10xgenomics.com/support/software/cell-ranger-arc/latest/analysis/inputs/mkref)
11+
12+
1013
Creating a custom reference genome file is fast and easy! In its most basic form, <code>cell-seek <b>genome</b></code> only has *one required input* with the optional arguments supplying the reference paths.
1114

1215
## 2. Synopsis

docs/usage/run.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -741,7 +741,7 @@ Each of the following arguments are required. Failure to provide a required argu
741741
> ***Example:*** `--pipeline atac`
742742
743743
---
744-
`--genome {hg38, mm10, custom.json}`
744+
`--genome {hg38, mm10, hg2024, mm2024, custom.json}`
745745
> **Reference genome.**
746746
> *type: string*
747747
>
@@ -841,7 +841,7 @@ Each of the following arguments are required. Failure to provide a required argu
841841
> ***Example:*** `--pipeline multiome`
842842
843843
---
844-
`--genome {hg38, mm10, custom.json}`
844+
`--genome {hg38, mm10, hg2024, mm2024, custom.json}`
845845
> **Reference genome.**
846846
> *type: string*
847847
>

src/run.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -810,6 +810,7 @@ def finalcheck(config, flag, delimeter=','):
810810

811811
# Dictionary holding unique contents from files to use for comparisons
812812
contents = {}
813+
name_type = {}
813814
with open(filename) as fh:
814815
try:
815816
header = next(fh).strip().split(delimeter)
@@ -827,6 +828,9 @@ def finalcheck(config, flag, delimeter=','):
827828
values = contents.get(i, set())
828829
values.add(linelist[indices[i]])
829830
contents[i] = values
831+
value = name_type.get(linelist[indices['name']], set())
832+
value.add(linelist[indices['type']])
833+
name_type[linelist[indices['name']]] = value
830834

831835
# Compiles the sample names and fastq paths from the input (config)
832836
samples = set([re.sub("_S[0-9]+_L00[0-9]", "", i) for i in config['samples']])
@@ -866,6 +870,12 @@ def finalcheck(config, flag, delimeter=','):
866870
└── Please note that the followed listed FASTQ names are not found in the input files: {{}} '.format(flag, filename, ','.join(missing_file))
867871
)
868872

873+
names = [name for name in name_type if (len(name_type[name]) <= 1)]
874+
if len(names) > 0:
875+
print(f"\nWarning: Some samples only have one feature type associated with them! \nWarning: --{{}} {{}} only contains one feature type for some of the samples.\n \
876+
└── Please note that only one feature type was provided for the following sample(s): {{}} \n \
877+
If this is correct, these samples do not need to be run using cellranger multi.".format(flag, filename, ','.join(names)))
878+
869879

870880
def check_conditional_parameters(config):
871881
"""Check the compiled config fictionary to ensure
@@ -909,7 +919,7 @@ def check_conditional_parameters(config):
909919

910920
#Check reference
911921
if config['options']['genome'] in ['hg2024', 'mm2024']:
912-
if config['options']['pipeline'] in ['atac', 'multiome']:
922+
if config['options']['pipeline'] in []:
913923
errorMessage += [
914924
"Error: The {} reference is not available for the {} pipeline\n \
915925
└── Please use the --genome flag to select one of the available references: {}".format(

workflow/rules/gex.smk

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,6 @@ rule count:
173173
log ="run_{sample}_10x_cellranger_count.log"
174174
params:
175175
rname = "count",
176-
batch = "-l nodes=1:ppn=16,mem=96gb",
177176
id = "{sample}",
178177
sample = sample_rename,
179178
transcriptome = config["references"][genome]["gex_transcriptome"],

workflow/scripts/seuratCiteSampleQC.R

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,10 @@ if (length(grep('^HTO[-_]', grep('hashtag', rownames(rdata$`Antibody Capture`),
4646
adt_assay <- CreateAssayObject(counts=rdata$`Antibody Capture`[grep('^HTO[-_]', grep('hashtag', rownames(rdata$`Antibody Capture`), value=TRUE, ignore.case=TRUE, invert=TRUE), value=TRUE, ignore.case=TRUE, invert=TRUE),])
4747
filtered_cite[['ADT']] <- names(which(apply(GetAssayData(adt_assay, slot='counts'), 1, max) <= adt_thresh))
4848
adt_names <- names(which(apply(GetAssayData(adt_assay, slot='counts'), 1, max) > adt_thresh))
49-
seur[['ADT']] <- CreateAssayObject(counts=GetAssayData(adt_assay, slot='counts')[adt_names,])
50-
adt = TRUE
49+
if (length(adt_names) > 0) {
50+
seur[['ADT']] <- CreateAssayObject(counts=GetAssayData(adt_assay, slot='counts')[adt_names,])
51+
adt = TRUE
52+
}
5153
}
5254

5355
# Add in HTO assay if features with HTO was found
@@ -56,8 +58,10 @@ if (length(as.character(c(grep('hashtag', rownames(rdata$`Antibody Capture`), va
5658
hto_assay <- CreateAssayObject(counts=rdata$`Antibody Capture`[unique(as.character(c(grep('hashtag', rownames(rdata$`Antibody Capture`), value=TRUE, ignore.case=TRUE), grep('^HTO[-_]', rownames(rdata$`Antibody Capture`), value=TRUE, ignore.case=TRUE)))),])
5759
filtered_cite[['HTO']] <- names(which(apply(GetAssayData(hto_assay, slot='counts'), 1, max) <= adt_thresh))
5860
hto_names <- names(which(apply(GetAssayData(hto_assay, slot='counts'), 1, max) > adt_thresh))
59-
seur[['HTO']] <- CreateAssayObject(counts=GetAssayData(hto_assay, slot='counts')[hto_names,])
60-
hashtag = TRUE
61+
if (length(hto_names) > 0) {
62+
seur[['HTO']] <- CreateAssayObject(counts=GetAssayData(hto_assay, slot='counts')[hto_names,])
63+
hashtag = TRUE
64+
}
6165
}
6266

6367
write.table(adt_thresh, 'CITE_threshold.txt', col.names = FALSE, row.names=FALSE)
@@ -82,6 +86,8 @@ figures <- list()
8286

8387
## ----Pre-Filter Gene Plot----
8488
seur[["percent.mito"]] <- PercentageFeatureSet(seur, pattern="^[Mm][Tt]-")
89+
seur[["percent.rps"]] <- PercentageFeatureSet(seur, pattern="^R[Pp][Ss]")
90+
seur[["percent.rpl"]] <- PercentageFeatureSet(seur, pattern="^R[Pp][Ll]")
8591

8692
plot1 <- FeatureScatter(seur, feature1 = "nCount_RNA", feature2 = "percent.mito") + NoLegend()
8793
plot2 <- FeatureScatter(seur, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") + NoLegend()
@@ -93,6 +99,7 @@ png("PreFilter_Gene_Plot.png", height=5, width=10, units='in', res=300)
9399
plot1+plot3+plot2
94100
dev.off()
95101

102+
96103
## ----Cell Quality Thresholds - Default----
97104
thresh <- list()
98105
defaultThreshold <- function(seur) {
@@ -199,6 +206,13 @@ dev.off()
199206
figures$PreFilter_VlnPlot_RNA <- do.call("grid.arrange", c(plots, nrow=1))
200207

201208

209+
plots <- sapply(c("percent.rpl", "percent.rps"), function(x) doVlnPlot(aspect=x, seur=seur, thresh=thresh))
210+
211+
png("PreFilter_VlnPlot_Ribo.png", height=7, width=5, units='in', res=300)
212+
do.call("grid.arrange", c(plots, nrow=1))
213+
dev.off()
214+
215+
202216
if (adt) {
203217
## ----Pre-Filter ADT Violin Plot
204218
plots <- sapply(c("nFeature_ADT", "nCount_ADT"), function(x) doVlnPlot(aspect=x, seur=seur, thresh=thresh))
@@ -265,6 +279,13 @@ dev.off()
265279
figures$PostFilter_VlnPlot_RNA <- VlnPlot(seur, features = c("nFeature_RNA", "nCount_RNA", "percent.mito"), ncol = 3)
266280

267281

282+
plots <- sapply(c("percent.rpl", "percent.rps"), function(x) doVlnPlot(aspect=x, seur=seur, thresh=thresh))
283+
284+
png("PostFilter_VlnPlot_Ribo.png", height=7, width=5, units='in', res=300)
285+
do.call("grid.arrange", c(plots, nrow=1))
286+
dev.off()
287+
288+
268289
## ----Post-Filter ADT Violin Plot
269290
if (adt) {
270291
png("PostFilter_VlnPlot_ADT.png", height=7, width=5, units='in', res=300)
@@ -434,7 +455,7 @@ saveRDS(seur, 'seur_cluster.rds')
434455

435456
# ----Matrix export----
436457
if ( !dir.exists(file.path(opt$workdir, "cite-seq-matrix")) ) {
437-
dir.create(opt$workdir, "cite-hto-adt-matrix", showWarnings = F, recursive = T, mode = "1755")
458+
dir.create(file.path(opt$workdir, "cite-seq-matrix"), showWarnings = F, recursive = T, mode = "1755")
438459
}
439460
if ( is.element("HTO", names(seur@assays)) ) {
440461
hto_mat <- GetAssayData(object = seur, assay = "HTO", layer = "data")

workflow/scripts/seuratSampleQC.R

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ figures <- list()
4949

5050
## ----Pre-Filter Gene Plot----
5151
seur[["percent.mito"]] <- PercentageFeatureSet(seur, pattern="^[Mm][Tt]-")
52+
seur[["percent.mito"]] <- PercentageFeatureSet(seur, pattern="^[Mm][Tt]-")
53+
seur[["percent.rps"]] <- PercentageFeatureSet(seur, pattern="^R[Pp][Ss]")
54+
seur[["percent.rpl"]] <- PercentageFeatureSet(seur, pattern="^R[Pp][Ll]")
5255

5356
plot1 <- FeatureScatter(seur, group.by='Sample', feature1 = "nCount_RNA", feature2 = "percent.mito") + NoLegend()
5457
plot2 <- FeatureScatter(seur, group.by='Sample', feature1 = "nCount_RNA", feature2 = "nFeature_RNA") + NoLegend()
@@ -71,12 +74,12 @@ defaultThreshold <- function(seur) {
7174
thresh['nCount_RNA_low'] <- expm1(median(log1p(seur$nCount_RNA)) - 3*mad(log1p(seur$nCount_RNA))) %>% round
7275
thresh['nCount_RNA_high'] <- expm1(median(log1p(seur$nCount_RNA)) + 3*mad(log1p(seur$nCount_RNA))) %>% round
7376
thresh['percent.mito_high'] = min(expm1(median(log1p(seur$percent.mito)) + 3*mad(log1p(seur$percent.mito))) %>% round, 100)
74-
77+
7578
cellsToRemove <- colnames(seur)[which(seur$nFeature_RNA < thresh['nFeature_RNA_low'] | seur$nFeature_RNA > thresh['nFeature_RNA_high'])]
7679
cellsToRemove <- union(cellsToRemove, colnames(seur)[which(seur$nCount_RNA < thresh['nCount_RNA_low'] | seur$nCount_RNA > thresh['nCount_RNA_high'])])
7780
cellsToRemove <- union(cellsToRemove, colnames(seur)[which(seur$percent.mito > thresh['percent.mito_high'])])
78-
79-
81+
82+
8083
thresh['numCellsRemove'] <- length(cellsToRemove)
8184
thresh['pctCellsRemove'] <- length(cellsToRemove) / dim(seur)[2] * 100
8285
return(list(threshold=thresh, filter=cellsToRemove))
@@ -88,7 +91,7 @@ if (!is.na(opt$filterfile)){
8891
if (sum(thresholds[,index] == opt$sample) == 1) {
8992
thresh_orig <- thresholds[which(thresholds[,index] == opt$sample),]
9093
thresh_orig[index] <- NULL
91-
94+
9295
thresh <- list()
9396
cellsToRemove <- character()
9497
for (i in colnames(thresh_orig)) {
@@ -146,6 +149,14 @@ dev.off()
146149

147150
figures$PreFilter_VlnPlot_RNA <- do.call("grid.arrange", c(plots, nrow=1))
148151

152+
153+
plots <- sapply(c("percent.rpl", "percent.rps"), function(x) doVlnPlot(aspect=x, seur=seur, thresh=thresh))
154+
155+
png("PreFilter_VlnPlot_Ribo.png", height=7, width=5, units='in', res=300)
156+
do.call("grid.arrange", c(plots, nrow=1))
157+
dev.off()
158+
159+
149160
## ----Pre-Filter UMAP Plot-------
150161
seur <- NormalizeData(seur, normalization.method = "LogNormalize", scale.factor = 10000)
151162
seur <- FindVariableFeatures(seur, selection.method = "vst", nfeatures = 2000)
@@ -189,6 +200,13 @@ dev.off()
189200

190201
figures$PostFilter_VlnPlot_RNA <- VlnPlot(seur, features = c("nFeature_RNA", "nCount_RNA", "percent.mito"), ncol = 3)
191202

203+
plots <- sapply(c("percent.rpl", "percent.rps"), function(x) doVlnPlot(aspect=x, seur=seur, thresh=thresh))
204+
205+
png("PostFilter_VlnPlot_Ribo.png", height=7, width=5, units='in', res=300)
206+
do.call("grid.arrange", c(plots, nrow=1))
207+
dev.off()
208+
209+
192210
## ----RNA Normalizing and Clustering----
193211
seur <- NormalizeData(seur, normalization.method = "LogNormalize", scale.factor = 10000)
194212
seur <- FindVariableFeatures(seur, selection.method = "vst", nfeatures = 2000)
@@ -203,11 +221,11 @@ coord <- Embeddings(seur, reduction='pca')[,1:30]
203221
d <- dist(coord, method="euclidean")
204222
for(resolution in c(0.1, seq(0.2,1.0,0.2), 1.5, 2.0)){
205223
seur <- FindClusters(seur, resolution = resolution)
206-
224+
207225
#Calculate silhouette scores and generate plots
208226
try({
209227
clusters <- Idents(seur)
210-
sil<-silhouette(as.numeric(clusters), dist=d)
228+
sil<-silhouette(as.numeric(clusters), dist=d)
211229
pdf(paste0("SilhouettePlot_res.",resolution,".pdf"))
212230
print(plot(sil, col=as.factor(clusters[order(clusters, decreasing=FALSE)]), main=paste("Silhouette plot of Seurat clustering - resolution ", resolution, sep=""), lty=2))
213231
print(abline(v=mean(sil[,3]), col="red4", lty=2))
@@ -236,4 +254,3 @@ saveRDS(seur, 'seur_cluster.rds')
236254
#saveRDS(figures, 'seur_figures.rds')
237255

238256
writeLines(capture.output(devtools::session_info()), 'sessionInfo.txt')
239-

0 commit comments

Comments
 (0)