Subset regions from cistromes params and fix dotplot repressors#337
Subset regions from cistromes params and fix dotplot repressors#337DanieFD wants to merge 8 commits intoaertslab:mainfrom
Conversation
SeppeDeWinter
left a comment
There was a problem hiding this comment.
Hi Danie
Thank you for the PR.
Could you have a look at the comments I made.
Best,
Seppe
src/scenicplus/cli/commands.py
Outdated
| out_file_direct_annotation: pathlib.Path, | ||
| out_file_extended_annotation: pathlib.Path, | ||
| out_file_tf_names: pathlib.Path, | ||
|
|
There was a problem hiding this comment.
Is there any particular reason why you added blank lines between the parameters? Otherwise I would delete them.
There was a problem hiding this comment.
No no reason. I will delete them
src/scenicplus/cli/commands.py
Outdated
| paths_to_motif_enrichment_results=paths_to_motif_enrichment_results, | ||
| scplus_regions=set(mdata['scATAC'].var_names), | ||
| scplus_regions=regions, | ||
| subset_regions = regions_to_overlap, #could be the set of all ATAC regions or a subset if subset is not None |
There was a problem hiding this comment.
I would move this function call within the if statement block.
if path_to_regions_to_subset is not None:
[...]
adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
[...],
subset_regions = regions_to_overlap
)
else:
adata_direct_cistromes, adata_extended_cistromes = get_and_merge_cistromes(
[...],
)
src/scenicplus/cli/commands.py
Outdated
| mdata = mudata.read(multiome_mudata_fname.__str__()) | ||
| log.info("Getting cistromes.") | ||
| regions = set(mdata['scATAC'].var_names) | ||
| regions_to_overlap = set(mdata['scATAC'].var_names) |
There was a problem hiding this comment.
I would move this within the if statement, see next comment as well.
src/scenicplus/cli/scenicplus.py
Outdated
| parser.add_argument( | ||
| "--path_to_regions_to_subset", dest="path_to_regions_to_subset", | ||
| action="store", type=str, required=False,default ="", | ||
| help="Path to bed file for regions to subset when merging cistromes (MACS called peaks).") |
There was a problem hiding this comment.
They can be any bed file right, don't necessarily need to be peaks called by MACS? If this is the case I would remove "(MACS called peaks)" from the documentation to avoid confusion.
There was a problem hiding this comment.
True could be removed. or could add for example. Will fix it
| extended = True)) | ||
| return cistromes | ||
|
|
||
| def _overlap_if_necessary(s_query_regions, test_regions, regions_to_overlap): |
There was a problem hiding this comment.
If I'm not mistaken this function returns either a set (else block) or a pyranges object (if block), while the Cistrome class expects a set for the target_regions parameter.
Also I would run this function not while providing the arguments to clearly seperate these two things (i.e. 1. initiating the class and 2. putting the subset region in the same coordinate system as SCENIC+).
src/scenicplus/plotting/dotplot.py
Outdated
| plotnine.options.figure_size = figsize | ||
| plotting_df["repressor_activator"] = [ | ||
| "activator" if "+" in n.split("_")[2] else "repressor" for n in plotting_df[feature_name_key]] | ||
| "repressor" if "-" in n.split("_")[2] else "activator" for n in plotting_df[feature_name_key]] |
There was a problem hiding this comment.
ah yes, that was an obvious mistake from my part. Thanks!
… reading it as None
Added the "path to regions to subset" as a parameter in the snakemake config file. This allows giving the path to the bed file containing the MACS called peaks that one would like to subset when creating cistromes (hence, cell type-specific cistromes). It can also be left empty (empty string) if no regions should be subsetted.
Also, one small change in the regulons dotplot: considered all regulons with "-" as repressors.