Description
Hi @svm-zhang ,
I tested MHCflow on some WES samples (150X, aligned to hg19) and noticed that the results differ from those generated by Polysolver. Specifically, MHCflow seems to favor calling homozygous loci more often. I pasted the result from one sample below as an example:
mhcflow result:
allele gene tot_scores sample
hla_a_30_01_01 hla_a 8022770.0758 P022_005
hla_a_30_01_01 hla_a 4011385.0379 P022_005
hla_b_40_02_11 hla_b 939190.0302 P022_005
hla_b_40_02_15 hla_b 1813238.3457 P022_005
hla_c_07_02_02 hla_c 191286.0893 P022_005
hla_c_07_17_02 hla_c 326942.8309 P022_005
polysolver result:
HLA-A hla_a_11_01_01 hla_a_30_01_01
HLA-B hla_b_40_02_01 hla_b_13_02_01
HLA-C hla_c_03_04_01_01 hla_c_06_02_01_01
The FASTQ file and kme files were retrieved from Polysolver data, and I manually created the HLA class I BED file as follows
chr6 29909037 29913661
chr6 31321649 31324965
chr6 31236526 31239907
I’m aware you've made improvements upon Polysolver, so I’m wondering: is this kind of discrepancy expected? I would be happy to provide any additional files or information if that would help.
I also have another question, which may be outside the scope of mhcflow, but I would greatly appreciate any insights you could offer:
I previously used OptiType for HLA typing. However, since OptiType only provides 4-digit HLA types, and the original LOHHLA script requires 8-digit HLA types (because the corresponding FASTA files are categorized by 8-digit alleles), what would be the best way to adapt OptiType results for LOHHLA? I’ve seen some approaches that replace the 4-digit allele with the longest 8-digit form, does this method make sense from your perspective?
Thanks again for your amazing work and for any advice you can share!
Best regards,
Yang