Skip to content

core dump during phase rare #127

@theandyb

Description

@theandyb

I've run into this issue while attempting to re-phase individual probands from the 1000 genomes project, using the remaining 1000 genomes samples as the reference panel (i.e., each proband is individually phased using a panel with the trio removed). I've replicated this using both release v5.1.1, and what is currently in the git repository as of 7/November/2025.

The target vcf is generated in a manner in which the original AC and AN columns are maintained, so that the right variants are filtered when I run phase_common:

$shapeit_dir/phase_common/bin/phase_common \
    --input target.bcf \
    --map $shapeit_dir/resources/maps/b38/chr22.b38.gmap.gz \
    --reference reference.bcf \
    --region "chr22" \
    --filter-maf 0.001 \
    --thread 20 \
    --output scaffold.bcf

I then try to run phase_rare with the target, scaffold, and chunk file as follows:

chunk_file="${shapeit_dir}/resources/chunks/b38/4cM/chunks_chr22.txt"
while read LINE; do
    CHK=$(echo $LINE | awk '{ print $1; }')
    SRG=$(echo $LINE | awk '{ print $3; }')
    IRG=$(echo $LINE | awk '{ print $4; }')
    $shapeit_dir/phase_rare/bin/phase_rare \
      --input target.bcf \
      --scaffold scaffold.bcf \
      --map $shapeit_dir/resources/maps/b38/chr22.b38.gmap.gz \
      --input-region ${IRG} \
      --scaffold-region ${SRG} \
      --thread 8 \
      --output chunk_${CHK}.bcf
done < ${chunk_file}

The result for each chunk is a core dump due to a segmentation fault; exploring this a bit further, the segfault occurs at a call to conditioning_set::select:

conditioning_set::select (this=0x7fffffffbf90, V=..., G=...) at src/containers/conditioning_set/conditioning_set_selection.cpp:107
107                     while (indexes_pbwt_neighbour_serialized[e].first == h) {
(gdb) l 107
102             vrb.bullet("PBWT backward selection (" + stb.str(tac.rel_time()*1.0/1000, 2) + "s)");
103
104             stats1D statK;
105             for (long int h = 0, e = 0 ; h < n_haplotypes ; h ++) {
106                     vector < unsigned int > buffer;
107                     while (indexes_pbwt_neighbour_serialized[e].first == h) {
108                             buffer.push_back(indexes_pbwt_neighbour_serialized[e].second);
109                             e++;
110                     }
111

At this point, indexes_pbwt_neighbour_serialized is a vector of length 0, and e is 0; running p indexes_pbwt_neighbour_serialized[e] in gdb returns Cannot access memory at address 0x0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions