Skip to content

Gene identifier mapping #10

@Souhatifour

Description

@Souhatifour

Issue
In Part 2 of the tutorial, gene identifier mapping is not explicitly mentioned, but there are instances where gene mapping is needed. Especially, the tutorial involves various steps related to gene expression data and the selection of specific genes like selecting, intersecting, and manipulating gene sets based on their relevance to the analysis.

Example Scenario:
Consider Task 2 of the BuDDI analysis tutorial, where gene IDs are formatted as 'ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419', 'ENSG00000000457', and the goal is to transform these gene IDs into a different format like 'MIR1302-2HG', 'FAM138A', 'OR4F5', 'AL627309.1', 'AL627309.3'.

For your specific situation, the single-cell matched tissue has the mapping of the genes.

Suggested Approach:
In certain scenarios, the gene identifier mapping may not be available for all genes when transitioning from Ensembl IDs to ontology-based names. To address this, it is recommended to leverage the gene mapping from the single-cell matched tissue, as it likely contains a more comprehensive set of mappings.

In the provided example, a gene mapping is demonstrated using a Pandas DataFrame. The mapping includes columns for gene names ("Name") and Ensembl identifiers ("Ens"). The mapping can be done follow:

Create an empty DataFrame with columns for gene names and Ensembl IDs

gene_maps = pd.DataFrame(columns=["Name", "Ens"])  

Populate the "Name" column with gene names from the single cell AnnData object

gene_maps["Name"] = adata.var.index  

Populate the "Ens" column with Ensembl IDs from the AnnData object

gene_maps["Ens"] = adata.var["gene_ids"].values  

Save the gene mapping DataFrame to a CSV file

gene_maps.to_csv(f'{data_path}/gene_maps.csv')  

Extract the gene names for later use

gene_ids = gene_maps["Name"] 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions