Skip to content

Commit e2d2a25

Browse files
an-altosianclaude
andcommitted
Sort layers alphabetically for deterministic h5ad output
The HDF5 B-tree stores group children in insertion order, and Python set iteration order (used when populating layers from output_assays) is non-deterministic. This caused the layers (spliced, unspliced, ambiguous) to be written in different orders across runs, producing different md5 checksums even though the data was identical. Sort layers alphabetically via OrderedDict before writing h5ad files to ensure byte-level reproducibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c327d6a commit e2d2a25

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

src/qcatch/input_processing.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import logging
66
import os
77
import shutil
8+
from collections import OrderedDict
89
from pathlib import Path
910

1011
import numpy as np
@@ -469,6 +470,9 @@ def save_results(args, version, intermediate_result, valid_bcs):
469470
args.input.mtx_data.obs_names.sort_values(), args.input.mtx_data.var_names.sort_values()
470471
].copy()
471472

473+
# Sort layers alphabetically for deterministic h5ad output
474+
args.input.mtx_data.layers = OrderedDict(sorted(args.input.mtx_data.layers.items()))
475+
472476
if args.input.is_h5ad and output_dir == args.input.dir:
473477
# Inplace overwrite: same location as original
474478
temp_file = os.path.join(output_dir, "quants.h5ad")
@@ -494,6 +498,8 @@ def save_results(args, version, intermediate_result, valid_bcs):
494498
filter_mtx_data = filter_mtx_data[
495499
filter_mtx_data.obs_names.sort_values(), filter_mtx_data.var_names.sort_values()
496500
].copy()
501+
# Sort layers alphabetically for deterministic h5ad output
502+
filter_mtx_data.layers = OrderedDict(sorted(filter_mtx_data.layers.items()))
497503
# Save the filtered anndata to a new file
498504
filter_mtx_data_filename = os.path.join(output_dir, "filtered_quants.h5ad")
499505
filter_mtx_data.write_h5ad(filter_mtx_data_filename, compression="gzip")

0 commit comments

Comments
 (0)