Skip to content

Large hyperpartition groups missing from groups.agp after scaffolding (polyploid CiFi dataset) #49

@DR-Omics

Description

@DR-Omics

Hi,
Thanks for the great tool.
I ran the full cphasing pipeline using CiFi data (highly polyploid plant genome). Mapped 3 CiFi datasets (3 fastq's) of same library separately to the contig-level assembly and merged them into one porec.gz and then ran cphasing pipeline on the merged dataset

cphasing mapper $REF $LIB2 --mm2-params "-x map-hifi" -t $THREADS -o $WRKDIR/CiFi_lib2_Mar2026
cphasing mapper $REF $LIB2_FULL --mm2-params "-x map-hifi" -t $THREADS -o $WRKDIR/CiFi_lib2_Full
cphasing mapper $REF $LIB2_PART --mm2-params "-x map-hifi" -t $THREADS -o $WRKDIR/CiFi_lib2_PART
# Merge porec.gz
cphasing porec-merge $WRKDIR/CiFi_lib2*.porec.gz -o $WRKDIR/CiFi_lib2.merged.porec.gz
# Cphasing pipeline
cphasing pipeline \
  -f "$REF" \
  -pct "$WRKDIR/CiFi_lib2.merged.porec.gz" \
  -t $THREADS \
  -n 10:0 \
  -o "$OUTDIR/CiFi_Lib2_mergedLibs_cphasing_n10_0" \
  --min-contacts 1 \
  --min-scaffold-length 10000 \
  --mm2-params "-x map-hifi" \
  -p CATG

Hyperpartition completed successfully and produced large subgroup partitions such as:

1g1   99,754,319
1g2   131,913,392
1g3   92,607,939
1g4   102,777,481
1g5   100,700,259
...
1g101 26,898
2g1 148,030,961
...
10g1  63,671,421
10g2  59,608,904
...
10g10 56,175

with total partitioned size:
Total 5,454,328,446
However, after scaffolding:
groups.agp seems to start only from smaller groups like Chr01g8, while large groups (Chr01g1–g7) are absent.
Same issue occurs across all 10 chromosomes.Example from AGP stats:

Chr01g8   4   38742592
Chr01g11  2   127880
Chr01g12  1   116322
...
Chr08g8   1   272317
Chr08g9   1   71971
Chr08g10  1   45446
...
Chr09g4   1   22676471
Chr09g6   2   22266383
...
Chr10g8 1       100902
Chr10g10        1       56175
Total number of contigs:        3411
Total length of contigs (bp):   5533696479
Total number of anchored contigs:       170
Total length of anchored contigs (bp):  171330149
Total length of chromosome level assembly (bp): 171331049
Number of unanchored contigs:   3241
Length of unanchored contigs (bp):      5362366330
Anchor rate (%):        3.10

Question: Why are the major hyperpartitioned groups excluded during scaffolding? Also, in plots, the chromosome subgroup labelling is collapsed into Chr01, Chr02, etc., Please see attached plot. Any advice appreciated.

Thank you
Dhanu

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions