Skip to content

Commit 386345b

Browse files
committed
Feed dbNSFP Orphanet and disease descriptions into LLM phenotype summary
1 parent 47f2b49 commit 386345b

1 file changed

Lines changed: 5 additions & 3 deletions

File tree

annotation_utils/add_phenotype_summary_using_AI.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
if not args.output_path:
1818
args.output_path = args.combined_table.replace(".tsv", "_with_phenotype_summary.tsv")
1919

20-
prompt_prefix1 = """You are a clinical geneticist. You have assembled known gene-disease associations from authoritative sources that include OMIM, GenCC, ClinGen, ClinVar, and PanelApp. Now, you need to condense the phenotypes described in these different sources into a single concise comma-separate list that covers the primary features or symptoms of the disease, as well as the main organ systems that are affected. For example, when the source phenotypes are:
20+
prompt_prefix1 = """You are a clinical geneticist. You have assembled known gene-disease associations from authoritative sources that include OMIM, GenCC, ClinGen, ClinVar, PanelApp, Orphanet, and dbNSFP. Now, you need to condense the phenotypes described in these different sources into a single concise comma-separate list that covers the primary features or symptoms of the disease, as well as the main organ systems that are affected. For example, when the source phenotypes are:
2121
2222
OMIM: 'Congenital disorder of glycosylation, type Ie', CLINGEN: 'congenital disorder of glycosylation type 1E', PANEL APP UK: 'Congenital disorder of glycosylation, type Ie, OMIM:608799, GDP-Man:Dol-P mannosyltransferase deficiency (Disorders of m
2323
ultiple glycosylation and other glycosylation pathways); Congenital disorder of glycosylation, type Ie, OMIM:608799; Congenital disorder of glycosylation, type Ie, OMIM:608799; Congenital disorder of glycosylation, type Ie, OMIM:608799; Congenital
@@ -34,7 +34,7 @@
3434
"""
3535

3636
prompt_prefix2 = """
37-
You are a clinical geneticist. You have assembled known gene-disease associations from authoritative sources that include OMIM, GenCC, ClinGen, ClinVar, and PanelApp. Now, you need to select a single
37+
You are a clinical geneticist. You have assembled known gene-disease associations from authoritative sources that include OMIM, GenCC, ClinGen, ClinVar, PanelApp, Orphanet, and dbNSFP. Now, you need to select a single
3838
disease category that is the best match for the provided phenotypes. The possible disease categories are:
3939
4040
'BIOCHEMICAL/METABOLIC',
@@ -85,8 +85,10 @@ def summarize_phenotypes(row, prompt_prefix=prompt_prefix1, blank_if_no_phenotyp
8585
("PANEL_APP_AU", "PANEL_APP_AU_phenotypes"),
8686
("CLINVAR", "CLINVAR_phenotypes"),
8787
("FRIDMAN", "FRIDMAN_phenotype_category"),
88+
("ORPHANET", "DBNSFP_orphanet_disorder"),
89+
("DBNSFP_DISEASE", "DBNSFP_disease_description"),
8890
]:
89-
if not pd.isna(row[phenotype_column]):
91+
if phenotype_column in row and not pd.isna(row[phenotype_column]):
9092
phenotypes.append(f"{label}: {row[phenotype_column]}")
9193

9294
if not phenotypes:

0 commit comments

Comments
 (0)