Skip to content

UnicodeDecodeError when reading H5AD in Python #400

@lazappi

Description

@lazappi

Thanks so much for your reply @lazappi . I tried to upgrade to Seurat v5 and upgraded the v4 data to the v5 structure. After exporting it to h5ad format using write_h5ad(), when I read this file in Python, an error occurred:

library(anndataR)
anndataR::write_h5ad(
  object = d2m1_all_v5,
  path = "anndataR_v5.h5ad")
adata1 = ad.read_h5ad("anndataR_v5.h5ad")

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128).

I thought it was a problem with my data, but when I used another tool to convert my data, there was no issue when reading it in Python:

library(convert2anndata)
library(anndata)
sce <- convert2anndata::convert_seurat_to_sce(d2m1_all_v5)
ad <- convert2anndata::convert_to_anndata(
  sce,
  assayName = "RNA_counts", 
  useAltExp = T )
anndata::write_h5ad(ad, "convert2anndata_v5_counts.h5ad")
adata2 = ad.read_h5ad("convert2anndata_v5.h5ad")
adata2

AnnData object with n_obs × n_vars = 633056 × 24591
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'Patient_ID', 'Cell_cluster_major', 'Cell_cluster_minor', 'Tumor_Capsule_Integrity', 'Age', 'Gender', 'AFP_μg_L', 'Tumor_Stage', 'Size_cm', 'Vascular_Invasion', 'Etiology', 'Cirrhosis', 'GS_G', 'GS_S', 'ALB_mutation', 'APOB_mutation', 'ARID1A_mutation', 'ATRX_mutation', 'AXIN1_mutation', 'CPS1_mutation', 'CTNNB1_mutation', 'KEAP1_mutation', 'KMT2C_mutation', 'RB1_mutation', 'SMARCA2_mutation', 'TP53_mutation', 'TSC2_mutation', 'Patient', 'Sample', 'Tissue_Source', 'Diagnosis', 'Tumor_Type', 'Platform', 'Sample_ID', 'log10GenesPerUMI', 'percent.rb', 'percent.hb', 'scDblFinder.class', 'scDblFinder.score', 'Contamination', 'TNM', 'Site_Major', 'Site_Minor', 'Distant_Metastasis', 'CEA', 'CA199', 'Child_Pugh', 'BCLC_Stage', 'Number_of_Lesions', 'RNA_snn_res.0.01', 'RNA_snn_res.0.05', 'RNA_snn_res.0.08', 'RNA_snn_res.0.1', 'RNA_snn_res.0.2', 'RNA_snn_res.0.5', 'RNA_snn_res.0.8', 'RNA_snn_res.1', 'Cell_cluster_res.1', 'Cell_cluster_res.0.05'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
obsm: 'X_harmony', 'X_pca', 'X_tsne', 'X_tsne_naive', 'X_umap', 'X_umap_naive'

Originally posted by @rainbowkiva in #399

Metadata

Metadata

Assignees

No one assigned

    Labels

    h5adIssues related to H5AD filesneeds infoNeeds additional information

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions