Context
Pseudotime package slingshot did not satisfy me for numerous reasons; amongst them: it is slow, irreproducible, and builds on an input distant from the original data structure, giving very different results depending on how I tweak the input. The python package Palantir (https://github.com/dpeerlab/Palantir) uses the "raw" multidimensional space to identify paths based on a k nearest neighbors graph. Therefore it is much closer to the original structure than e.g. slingshot, which uses a clustering and 2D-projection as input.
Palantir is from the same group that published the clustering algorithm phenograph, which is also knn-graph based. Palantir only works in python; reticulate failed for me due to numba related errors. My notebook part is heavily inspired by https://github.com/dpeerlab/Palantir/blob/master/notebooks/Palantir_sample_notebook.ipynb and my code is written with help from ChatGPT&co.
Summary
This "butterfly" pipeline integrates palantir pseudotime analysis on top of a condor object at any stage of cyCONDOR analysis by writing cell_IDs and marker values to butterfly_R2py.csv, importing this into a JupyterLab Notebook, preproccessing the data, running palantir, exporting pseudotime to butterfly_py2R.csv, importing it into R, and adding it to the condor object.
It was a bit of reading and coding, but the results made it worth it. I heavily recommend palantir to be included in future versions of cyCONDOR.
Results
Worked wonderfully for me, reproducible for various parameters, matches with my gating, consistent accross samples. I had 6 markers + FSC-A + SSC-A for my samples, with ~10k cells per dataset (manually pregated in FJ to my cell type) from >>3 donors. I run different donors as separate codnor objects (in R/cyCONDOR) and as separate butterfly objects (in python/Palantir).
Unfortunately I cannot share any results at this time.
In R, run (at any poit of the cyCONDOR analysis):
R2py_markers = c("...", "FSC-A", "SSC-A") #adjust to markers of your choice
write.csv(condor$expr$orig[, R2py_markers],
file = "folder_path/file_name_R2py.csv",
row.names = TRUE)
In python, run this as a notebook:
import palantir
import scanpy as sc
import pandas as pd
import os
import matplotlib
import matplotlib.pyplot as plt
import warnings
from numba.core.errors import NumbaDeprecationWarning
warnings.filterwarnings(action="ignore", category=NumbaDeprecationWarning)
warnings.filterwarnings(action="ignore", module="scanpy", message="No data for colormapping")
get_ipython().run_line_magic('matplotlib', 'inline')
Import data
Read csv to dataframe, transfer values to AnnData object, transfer rownames, transfer colnames
butterfly_df = pd.read_csv("file_path/file_name_R2py.csv", index_col=0)
butterfly = ad.AnnData(X=butterfly_df.values)
butterfly.obs_names = butterfly_df.index
butterfly.var_names = [i for i in butterfly_df.columns.tolist()]
number_of_markers = ...
Preprocessing
#n_comps: for few markers, use number of markers minus one
sc.pp.pca(butterfly, n_comps=number_of_markers-1)
#n_components: for few markers, use number of markers minus one
dump = palantir.utils.run_diffusion_maps(butterfly, n_components=number_of_markers-1)
dump = palantir.utils.determine_multiscale_space(butterfly)
#n_neighbors: for UMAP. More neighbors = more runtime
sc.pp.neighbors(butterfly, n_neighbors=15)
Optional: Save and Load data
I recommend to load and save between sessions
#Optional: Save butterfly object as h5ad file
butterfly.write('butterfly.h5ad')
#Optional: Load previously (pre-)proccessed data from h5ad file
butterfly = ad.read_h5ad('butterfly.h5ad')
Inspect UMAP
Inspect UMAPs colored by markers of interest and chose the start cell accordingly
markers_of_interest = ["...", "FSC-A", "SSC-A"]
sc.pl.embedding(butterfly, basis="umap", layer="X", color = markers_of_interest, frameon=False,)
Here is where you need the cell IDs of start cell candidates from your condor object (*)
#get reference cell IDs from R object via gating/cyCONDOR analysis/...
naive_cells = pd.Series(["01", "02", ...], index=
["Sample.fcs_8192", "Sample.fcs_4096", ...])
palantir.plot.highlight_cells_on_umap(butterfly, naive_cells)
Run Palantir
#start: select a cell that is known to be early in pseudotime
#knn: default is 30 | num_waypoints: default is 1200, reference uses 500
start = "Sample.fcs_4096"
butterfly_result = palantir.core.run_palantir(butterfly, early_cell = start, knn = 30, num_waypoints = 500)
palantir.plot.plot_palantir_results(butterfly)
plt.show()
masks = palantir.presults.select_branch_cells(butterfly, q=.01, eps=.01)
palantir.plot.plot_branch_selection(butterfly)
plt.show()
Export Palantir Pseudotime to .csv
export_butterfly_df = butterfly.obs[['palantir_pseudotime']].copy()
export_butterfly_df.to_csv('folder_path/file_name_py2R.csv')
In R, run:
pseudotime_run1 <- read.csv("folder_path/file_name_py2R.csv", row.names = 1)
if (is.null(condor$palantir)) {
condor$palantir <- list()
}
condor$palantir$run1 <- pseudotime_run1
Plot with:
plot_dim_red(fcd = condor,
reduction_method = "umap",
reduction_slot = "15_expr_orig",
pseudotime_slot = "run1",
add_pseudotime = TRUE,
param = "palantir_pseudotime",
dot_size = 0.5,
alpha = 0.5,
title = "Palantir Pseudotime")
(*) How to get cell IDs of start cell candidates in R/cyCONDOR, if you know they are in a specific cluster (here: Phenograph cluster "1"):
print(rownames(condor$clustering$phenograph_expr_orig_k_60[condor$clustering$phenograph_expr_orig_k_60$Phenograph == "1", ]))
Edit: Handle palantir pseudotime, entropy and branch probabilities with cyCONDOR functions
after running the palantir core algorithm, export the pseudotime (pt), entropy (pe) and branch probabilities (bp) like this:
butterfly_result.pseudotime.to_csv('file_path/filename_py2R_pt.csv')
butterfly_result.entropy.to_csv('file_path/filename_py2R_pe.csv')
butterfly_result.branch_probs.to_csv('file_path/filename_py2Rt_bp.csv')
Import results to conodor$expr$orig
Importing the palantir results alongside the markers into condor$expr$orig allows the visualization of these results with plotting functions originally designed for markers (e.g. 2D dotplots, ridgelineplots, color UMAPs with pseudotime/pranch probs, ...).
The grave danger here is, that these "novel" markers are not real markers and would by default be included in clustering/dimred/... runs. Make sure to specifically select (include/exclude) markers for future algorithm runs.
# Palantir pseudotime import
pt_file <- "file_path/filename_py2R_pt.csv"
pt <- read.csv(pt_file, row.names = 1)
# Palantir entropy import
pe_file <- "file_path/filename_py2R_pe.csv"
pe <- read.csv(pe_file, row.names = 1)
# Palantir branch probability import
bp_file <- "file_path/filename_py2R_bp.csv"
bp_all <- read.csv(bp_file, row.names = 1)
# Separate branches manually
bp_A <- bp_all$branch_A_palantir_name # use whatever name palantir assigned
bp_B <- bp_all$branch_B_palantir_name # to see palantir assignes names, use: head(bp_all)
...
# Add palantir results to condor object marker slots
condor[["expr"]][["orig"]]["pt"] <- pt
condor[["expr"]][["orig"]]["pe"] <- pe
condor[["expr"]][["orig"]]["branch_A"] <- bp_A # optional: use custom branch names if cell type/state is known
condor[["expr"]][["orig"]]["branch_B"] <- bp_B
Visualize palantir results in cyCONDOR
Informative plots can be:
# UMAP colored by pseudotime, ideally side to side with your clustering of choice
plot_dim_red(condor, expr_slot = "orig", reduction_method = "umap", reduction_slot = "15_expr_orig", param = "pt")
# Ridgeplot of clusters by their pseudotime
plot_marker_ridgeplot(condor, marker = "pt_3BF", expr_slot = "orig", cluster_slot = "...", cluster_var = "...")
# Pseudotime and branch probability plot colored by cluster
plot_marker_dotplot(condor, expr_slot = "orig", marker_x = "pt", marker_y = "bp_A", cluster_slot = "...", cluster_var = "...")
# Pseudotime and marker intensity colored by cluster; e.g. for activation marker
plot_marker_dotplot(condor, expr_slot = "orig", marker_x = "pt", marker_y = "...", cluster_slot = "...", cluster_var = "...")
# Define flow_plot_marker: x axis, y axis and color_by are markers or palantir results inside expr$orig
flow_plot_marker <- function(obj, x_axis, y_axis, color_by, title = "Flow Plot", dotsize = 1) {
expr <- obj$expr$orig # get data
plot_df <- data.frame(x = expr[[x_axis]], y = expr[[y_axis]]) # make dataframe of x+y axis data
plot_df$color <- expr[[color_by]] # add data to color by
p <- ggplot(plot_df, aes(x = x, y = y)) +
geom_point(aes(color = color), size = dotsize, alpha = 0.6) +
labs(x = x_axis, y = y_axis, color = color_by, title = title) +
theme_minimal() # make plot
p <- p + scale_color_gradientn(colors = c("#AF00BF", "#0000FF", "#20DF20", "#FFD10F", "#FF7A00", "#DF0000"))
return(p)
}
# 2D scatter plot of two markers (x/y) colored by pseudotime
flow_plot_marker(condor, "marker_x_axis", "marker_y_axis", "pt")
# 2D scatter plot of pseudotime vs marker colored by another marker
flow_plot_marker(condor, "pt", "marker_y_axis", "marker_for_coloring")
# 2D scatter plot of pseudotime vs bp_A, colored by palantir entropy
flow_plot_marker(condor, "pt", "bp_A", "pe")
Context
Pseudotime package slingshot did not satisfy me for numerous reasons; amongst them: it is slow, irreproducible, and builds on an input distant from the original data structure, giving very different results depending on how I tweak the input. The python package Palantir (https://github.com/dpeerlab/Palantir) uses the "raw" multidimensional space to identify paths based on a k nearest neighbors graph. Therefore it is much closer to the original structure than e.g. slingshot, which uses a clustering and 2D-projection as input.
Palantir is from the same group that published the clustering algorithm phenograph, which is also knn-graph based. Palantir only works in python; reticulate failed for me due to numba related errors. My notebook part is heavily inspired by https://github.com/dpeerlab/Palantir/blob/master/notebooks/Palantir_sample_notebook.ipynb and my code is written with help from ChatGPT&co.
Summary
This "butterfly" pipeline integrates palantir pseudotime analysis on top of a condor object at any stage of cyCONDOR analysis by writing cell_IDs and marker values to butterfly_R2py.csv, importing this into a JupyterLab Notebook, preproccessing the data, running palantir, exporting pseudotime to butterfly_py2R.csv, importing it into R, and adding it to the condor object.
It was a bit of reading and coding, but the results made it worth it. I heavily recommend palantir to be included in future versions of cyCONDOR.
Results
Worked wonderfully for me, reproducible for various parameters, matches with my gating, consistent accross samples. I had 6 markers + FSC-A + SSC-A for my samples, with ~10k cells per dataset (manually pregated in FJ to my cell type) from >>3 donors. I run different donors as separate codnor objects (in R/cyCONDOR) and as separate butterfly objects (in python/Palantir).
Unfortunately I cannot share any results at this time.
In R, run (at any poit of the cyCONDOR analysis):
In python, run this as a notebook:
Import data
Read csv to dataframe, transfer values to AnnData object, transfer rownames, transfer colnames
Preprocessing
Optional: Save and Load data
I recommend to load and save between sessions
Inspect UMAP
Inspect UMAPs colored by markers of interest and chose the start cell accordingly
Here is where you need the cell IDs of start cell candidates from your condor object (*)
Run Palantir
Export Palantir Pseudotime to .csv
In R, run:
Plot with:
(*) How to get cell IDs of start cell candidates in R/cyCONDOR, if you know they are in a specific cluster (here: Phenograph cluster "1"):
Edit: Handle palantir pseudotime, entropy and branch probabilities with cyCONDOR functions
after running the palantir core algorithm, export the pseudotime (pt), entropy (pe) and branch probabilities (bp) like this:
Import results to
conodor$expr$origImporting the palantir results alongside the markers into
condor$expr$origallows the visualization of these results with plotting functions originally designed for markers (e.g. 2D dotplots, ridgelineplots, color UMAPs with pseudotime/pranch probs, ...).The grave danger here is, that these "novel" markers are not real markers and would by default be included in clustering/dimred/... runs. Make sure to specifically select (include/exclude) markers for future algorithm runs.
Visualize palantir results in cyCONDOR
Informative plots can be: