Skip to content

Butterfly Demo: How to run Palantir Pseudotime on condor object #32

@Muffin2001

Description

@Muffin2001

Context

Pseudotime package slingshot did not satisfy me for numerous reasons; amongst them: it is slow, irreproducible, and builds on an input distant from the original data structure, giving very different results depending on how I tweak the input. The python package Palantir (https://github.com/dpeerlab/Palantir) uses the "raw" multidimensional space to identify paths based on a k nearest neighbors graph. Therefore it is much closer to the original structure than e.g. slingshot, which uses a clustering and 2D-projection as input.
Palantir is from the same group that published the clustering algorithm phenograph, which is also knn-graph based. Palantir only works in python; reticulate failed for me due to numba related errors. My notebook part is heavily inspired by https://github.com/dpeerlab/Palantir/blob/master/notebooks/Palantir_sample_notebook.ipynb and my code is written with help from ChatGPT&co.

Summary

This "butterfly" pipeline integrates palantir pseudotime analysis on top of a condor object at any stage of cyCONDOR analysis by writing cell_IDs and marker values to butterfly_R2py.csv, importing this into a JupyterLab Notebook, preproccessing the data, running palantir, exporting pseudotime to butterfly_py2R.csv, importing it into R, and adding it to the condor object.
It was a bit of reading and coding, but the results made it worth it. I heavily recommend palantir to be included in future versions of cyCONDOR.

Results

Worked wonderfully for me, reproducible for various parameters, matches with my gating, consistent accross samples. I had 6 markers + FSC-A + SSC-A for my samples, with ~10k cells per dataset (manually pregated in FJ to my cell type) from >>3 donors. I run different donors as separate codnor objects (in R/cyCONDOR) and as separate butterfly objects (in python/Palantir).
Unfortunately I cannot share any results at this time.

In R, run (at any poit of the cyCONDOR analysis):

R2py_markers = c("...", "FSC-A", "SSC-A") #adjust to markers of your choice

write.csv(condor$expr$orig[, R2py_markers], 
          file = "folder_path/file_name_R2py.csv",
          row.names = TRUE)

In python, run this as a notebook:

import palantir
import scanpy as sc
import pandas as pd
import os
import matplotlib
import matplotlib.pyplot as plt
import warnings
from numba.core.errors import NumbaDeprecationWarning

warnings.filterwarnings(action="ignore", category=NumbaDeprecationWarning)
warnings.filterwarnings(action="ignore", module="scanpy", message="No data for colormapping")

get_ipython().run_line_magic('matplotlib', 'inline')

Import data
Read csv to dataframe, transfer values to AnnData object, transfer rownames, transfer colnames

butterfly_df = pd.read_csv("file_path/file_name_R2py.csv", index_col=0)
butterfly = ad.AnnData(X=butterfly_df.values)
butterfly.obs_names = butterfly_df.index
butterfly.var_names = [i for i in butterfly_df.columns.tolist()]
number_of_markers = ...

Preprocessing

#n_comps: for few markers, use number of markers minus one
sc.pp.pca(butterfly, n_comps=number_of_markers-1) 

#n_components: for few markers, use number of markers minus one
dump = palantir.utils.run_diffusion_maps(butterfly, n_components=number_of_markers-1)
dump = palantir.utils.determine_multiscale_space(butterfly)
#n_neighbors: for UMAP. More neighbors = more runtime
sc.pp.neighbors(butterfly, n_neighbors=15)
sc.tl.umap(butterfly)

Optional: Save and Load data
I recommend to load and save between sessions

#Optional: Save butterfly object as h5ad file
butterfly.write('butterfly.h5ad')
#Optional: Load previously (pre-)proccessed data from h5ad file
butterfly = ad.read_h5ad('butterfly.h5ad')

Inspect UMAP
Inspect UMAPs colored by markers of interest and chose the start cell accordingly

markers_of_interest = ["...", "FSC-A", "SSC-A"]
sc.pl.embedding(butterfly, basis="umap", layer="X", color = markers_of_interest, frameon=False,)   

Here is where you need the cell IDs of start cell candidates from your condor object (*)

#get reference cell IDs from R object via gating/cyCONDOR analysis/...
naive_cells = pd.Series(["01", "02", ...], index=
    ["Sample.fcs_8192", "Sample.fcs_4096", ...]) 

palantir.plot.highlight_cells_on_umap(butterfly, naive_cells)

Run Palantir

#start: select a cell that is known to be early in pseudotime
#knn: default is 30 | num_waypoints: default is 1200, reference uses 500
start = "Sample.fcs_4096"
butterfly_result = palantir.core.run_palantir(butterfly, early_cell = start, knn = 30, num_waypoints = 500)
palantir.plot.plot_palantir_results(butterfly)
plt.show()
masks = palantir.presults.select_branch_cells(butterfly, q=.01, eps=.01)
palantir.plot.plot_branch_selection(butterfly)
plt.show()

Export Palantir Pseudotime to .csv

export_butterfly_df = butterfly.obs[['palantir_pseudotime']].copy()
export_butterfly_df.to_csv('folder_path/file_name_py2R.csv')

In R, run:

pseudotime_run1 <- read.csv("folder_path/file_name_py2R.csv", row.names = 1)

if (is.null(condor$palantir)) {
  condor$palantir <- list()
}

condor$palantir$run1 <- pseudotime_run1

Plot with:

plot_dim_red(fcd = condor,
             reduction_method = "umap",
             reduction_slot = "15_expr_orig",
             pseudotime_slot = "run1",
             add_pseudotime = TRUE, 
             param = "palantir_pseudotime",
             dot_size = 0.5,
             alpha = 0.5,
             title = "Palantir Pseudotime")

(*) How to get cell IDs of start cell candidates in R/cyCONDOR, if you know they are in a specific cluster (here: Phenograph cluster "1"):

print(rownames(condor$clustering$phenograph_expr_orig_k_60[condor$clustering$phenograph_expr_orig_k_60$Phenograph == "1", ]))

Edit: Handle palantir pseudotime, entropy and branch probabilities with cyCONDOR functions

after running the palantir core algorithm, export the pseudotime (pt), entropy (pe) and branch probabilities (bp) like this:

butterfly_result.pseudotime.to_csv('file_path/filename_py2R_pt.csv')
butterfly_result.entropy.to_csv('file_path/filename_py2R_pe.csv')
butterfly_result.branch_probs.to_csv('file_path/filename_py2Rt_bp.csv')

Import results to conodor$expr$orig
Importing the palantir results alongside the markers into condor$expr$orig allows the visualization of these results with plotting functions originally designed for markers (e.g. 2D dotplots, ridgelineplots, color UMAPs with pseudotime/pranch probs, ...).
The grave danger here is, that these "novel" markers are not real markers and would by default be included in clustering/dimred/... runs. Make sure to specifically select (include/exclude) markers for future algorithm runs.

# Palantir pseudotime import
pt_file <- "file_path/filename_py2R_pt.csv"
pt <- read.csv(pt_file, row.names = 1)

# Palantir entropy import
pe_file <- "file_path/filename_py2R_pe.csv"
pe <- read.csv(pe_file, row.names = 1)

# Palantir branch probability import
bp_file <- "file_path/filename_py2R_bp.csv"
bp_all <- read.csv(bp_file, row.names = 1)

# Separate branches manually
bp_A <- bp_all$branch_A_palantir_name # use whatever name palantir assigned
bp_B <- bp_all$branch_B_palantir_name # to see palantir assignes names, use: head(bp_all)
...

# Add palantir results to condor object marker slots
condor[["expr"]][["orig"]]["pt"] <- pt
condor[["expr"]][["orig"]]["pe"] <- pe
condor[["expr"]][["orig"]]["branch_A"] <- bp_A # optional: use custom branch names if cell type/state is known
condor[["expr"]][["orig"]]["branch_B"] <- bp_B

Visualize palantir results in cyCONDOR
Informative plots can be:

# UMAP colored by pseudotime, ideally side to side with your clustering of choice
plot_dim_red(condor, expr_slot = "orig", reduction_method = "umap", reduction_slot = "15_expr_orig", param = "pt")

# Ridgeplot of clusters by their pseudotime
plot_marker_ridgeplot(condor, marker = "pt_3BF", expr_slot = "orig", cluster_slot = "...", cluster_var = "...")

# Pseudotime and branch probability plot colored by cluster
plot_marker_dotplot(condor, expr_slot = "orig", marker_x = "pt", marker_y = "bp_A", cluster_slot = "...", cluster_var = "...")

# Pseudotime and marker intensity colored by cluster; e.g. for activation marker
plot_marker_dotplot(condor, expr_slot = "orig", marker_x = "pt", marker_y = "...", cluster_slot = "...", cluster_var = "...")

# Define flow_plot_marker: x axis, y axis and color_by are markers or palantir results inside expr$orig
flow_plot_marker <- function(obj, x_axis, y_axis, color_by, title = "Flow Plot", dotsize = 1) {
  expr <- obj$expr$orig # get data
  plot_df <- data.frame(x = expr[[x_axis]], y = expr[[y_axis]]) # make dataframe of x+y axis data
  plot_df$color <- expr[[color_by]] # add data to color by
  
  p <- ggplot(plot_df, aes(x = x, y = y)) +
    geom_point(aes(color = color), size = dotsize, alpha = 0.6) +
    labs(x = x_axis, y = y_axis, color = color_by, title = title) +
    theme_minimal() # make plot
  p <- p + scale_color_gradientn(colors = c("#AF00BF", "#0000FF", "#20DF20", "#FFD10F", "#FF7A00", "#DF0000"))
  return(p)
}

# 2D scatter plot of two markers (x/y) colored by pseudotime
flow_plot_marker(condor, "marker_x_axis", "marker_y_axis", "pt")

# 2D scatter plot of pseudotime vs marker colored by another marker
flow_plot_marker(condor, "pt", "marker_y_axis", "marker_for_coloring")

# 2D scatter plot of pseudotime vs bp_A, colored by palantir entropy
flow_plot_marker(condor, "pt", "bp_A", "pe")

Metadata

Metadata

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions