-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I've run into this issue while attempting to re-phase individual probands from the 1000 genomes project, using the remaining 1000 genomes samples as the reference panel (i.e., each proband is individually phased using a panel with the trio removed). I've replicated this using both release v5.1.1, and what is currently in the git repository as of 7/November/2025.
The target vcf is generated in a manner in which the original AC and AN columns are maintained, so that the right variants are filtered when I run phase_common:
$shapeit_dir/phase_common/bin/phase_common \
--input target.bcf \
--map $shapeit_dir/resources/maps/b38/chr22.b38.gmap.gz \
--reference reference.bcf \
--region "chr22" \
--filter-maf 0.001 \
--thread 20 \
--output scaffold.bcf
I then try to run phase_rare with the target, scaffold, and chunk file as follows:
chunk_file="${shapeit_dir}/resources/chunks/b38/4cM/chunks_chr22.txt"
while read LINE; do
CHK=$(echo $LINE | awk '{ print $1; }')
SRG=$(echo $LINE | awk '{ print $3; }')
IRG=$(echo $LINE | awk '{ print $4; }')
$shapeit_dir/phase_rare/bin/phase_rare \
--input target.bcf \
--scaffold scaffold.bcf \
--map $shapeit_dir/resources/maps/b38/chr22.b38.gmap.gz \
--input-region ${IRG} \
--scaffold-region ${SRG} \
--thread 8 \
--output chunk_${CHK}.bcf
done < ${chunk_file}
The result for each chunk is a core dump due to a segmentation fault; exploring this a bit further, the segfault occurs at a call to conditioning_set::select:
conditioning_set::select (this=0x7fffffffbf90, V=..., G=...) at src/containers/conditioning_set/conditioning_set_selection.cpp:107
107 while (indexes_pbwt_neighbour_serialized[e].first == h) {
(gdb) l 107
102 vrb.bullet("PBWT backward selection (" + stb.str(tac.rel_time()*1.0/1000, 2) + "s)");
103
104 stats1D statK;
105 for (long int h = 0, e = 0 ; h < n_haplotypes ; h ++) {
106 vector < unsigned int > buffer;
107 while (indexes_pbwt_neighbour_serialized[e].first == h) {
108 buffer.push_back(indexes_pbwt_neighbour_serialized[e].second);
109 e++;
110 }
111
At this point, indexes_pbwt_neighbour_serialized is a vector of length 0, and e is 0; running p indexes_pbwt_neighbour_serialized[e] in gdb returns Cannot access memory at address 0x0.