Skip to content

Conversation

@likhitha-surapaneni
Copy link
Contributor

@nakib103 nakib103 self-requested a review October 15, 2025 10:40

logging.basicConfig(level=logging.INFO)

def find_variant_by_accession(xml_file, accession) -> dict|None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loading the XML into memory and then querying would be much faster.
The XMLs are in MB - so it should not take up much of memory.

header.info.add("ALLELE_TYPE", 1, "String", "Aggregated type of supporting calls")
header.info.add("CN", ".", "String", "Comma-separated list of copy numbers of supporting calls")
if "SVLEN" not in header.info:
del header.info['SVLEN']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
del header.info['SVLEN']

redundant

if calls:
new_rec.info["ALLELE_NAME"] = ",".join(call["ALLELE_NAME"] for call in calls)
new_rec.info["ALLELE_TYPE"] = aggregate_sv_type(call["ALLELE_TYPE"] for call in calls)
new_rec.alts = tuple([f"<{call['ALLELE_TYPE']}>" for call in calls])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are calculating final alts from SVTYPE?

new_rec.info["ALLELE_NAME"] = ",".join(call["ALLELE_NAME"] for call in calls)
new_rec.info["ALLELE_TYPE"] = aggregate_sv_type(call["ALLELE_TYPE"] for call in calls)
new_rec.alts = tuple([f"<{call['ALLELE_TYPE']}>" for call in calls])
new_rec.info["SVLEN"] = ",".join(call["SVLEN"][0] for call in calls if call["SVLEN"] is not None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if there is no SVLEN we need to put a .. As like any allele specific field SVELN needs one-to-one relationship with allele, otherwise we do not know which SVLEN belongs to which allele.
  • Is there a reason we are taking call["SVLEN"][0] (I assume for call there is always one allele and the list will always contain single value?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nakib103 , call["SVLEN"] is giving a list of single value (as opposed to the value itself) because of the type of SVLEN in VCF header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants