Skip to content

Commit 0bdbc58

Browse files
authored
Add function to insert LOINC related names (#73)
## Description This PR add a function to randomly insert a number of LOINC related names (according to the min and max inserts) into a given string. ## Related Issues Closes #65 ## Additional Notes I didn't add any of the functionality to parse the appropriate related names for a given LOINC `text`, figuring that would be a separate function. Also made some small changes to the script to pull down LOINCs that I needed to get it running. <--------------------- REMOVE THE LINES BELOW BEFORE MERGING ---------------------> ## Checklist Please review and complete the following checklist before submitting your pull request: - [ ] I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session. - [ ] I have reviewed my changes to ensure they are clear, concise, and well-documented. - [ ] I have updated the documentation, if applicable. - [ ] I have added or updated test cases to cover my changes, if applicable. - [ ] I have minimized the number of reviewers to include only those essential for the review. ## Checklist for Reviewers Please review and complete the following checklist during the review process: - [ ] The code follows best practices and conventions. - [ ] The changes implement the desired functionality or fix the reported issue. - [ ] The tests cover the new changes and pass successfully. - [ ] Any potential edge cases or error scenarios have been considered.
1 parent 6ad6250 commit 0bdbc58

File tree

3 files changed

+64
-0
lines changed

3 files changed

+64
-0
lines changed

scripts/terminology_valueset_sync.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030

3131
load_dotenv()
3232

33+
3334
# Set Terminology URLS
3435
LOINC_BASE_URL = "https://loinc.regenstrief.org/searchapi/loincs?"
3536
LOINC_LAB_ORDER_SUFFIX = "query=orderobs:Order+OR+orderobs:Both&rows=500"

src/dibbs_text_to_code/augmentation.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,35 @@ def scramble_word_order(
3131
words.insert(new_pos, word)
3232

3333
return " ".join(words)
34+
35+
36+
def insert_loinc_related_names(
37+
text: str, loinc_names: list[str], max_inserts: int, min_inserts: int = 1
38+
) -> str:
39+
"""
40+
Inserts 1 or more LOINC related names into the input text at random positions.
41+
42+
:param text: The input text to modify.
43+
:param loinc_names: A list of LOINC related names to insert.
44+
:param num_inserts: The number of LOINC names to insert.
45+
:return: The text with LOINC related name(s) inserted.
46+
"""
47+
words = text.split()
48+
if not loinc_names or len(words) < 1:
49+
return text
50+
51+
# Ensure num_inserts does not exceed the number of loinc_names
52+
num_inserts = random.randint(min_inserts, min(len(loinc_names), max_inserts))
53+
54+
# Select indices to insert at (can repeat)
55+
indices_to_insert = [random.randrange(len(words) + 1) for _ in range(num_inserts)]
56+
57+
# Select unique LOINC names to insert
58+
loinc_names_to_insert = random.sample(loinc_names, num_inserts)
59+
60+
for _ in range(num_inserts):
61+
name_to_insert = loinc_names_to_insert.pop()
62+
idx_to_insert = indices_to_insert.pop()
63+
words.insert(idx_to_insert, name_to_insert)
64+
65+
return " ".join(words)

tests/unit/test_augmentation.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,34 @@ class TestScrambleWordOrder:
2424
def test_scramble_word_order(self, text, max_perms, expected):
2525
result = augmentation.scramble_word_order(text, max_perms=max_perms)
2626
assert result == expected
27+
28+
29+
@pytest.mark.parametrize(
30+
"text, loinc_names, max_inserts,expected ",
31+
[
32+
# Empty string
33+
("", ["Blood", "Erythrocytes", "Calculation", "CalcRBC", "Volume fraction"], 3, ""),
34+
# Single word
35+
(
36+
"Blood",
37+
["Blood", "Erythrocytes", "Calculation", "CalcRBC", "Volume fraction"],
38+
3,
39+
"Erythrocytes Blood Volume fraction",
40+
),
41+
# No LOINC names
42+
("Hematocrit of Blood", [], 3, "Hematocrit of Blood"),
43+
# More inserts than LOINC names
44+
(
45+
"Hematocrit [Volume Fraction] of Blood by calculation",
46+
["Blood", "Erythrocytes", "Calculation", "CalcRBC", "Volume fraction"],
47+
5,
48+
"Erythrocytes Hematocrit [Volume Fraction] of Volume fraction Blood by calculation",
49+
),
50+
],
51+
)
52+
class TestInsertLoincRelatedNames:
53+
def test_insert_loinc_related_names(self, text, loinc_names, max_inserts, expected):
54+
result = augmentation.insert_loinc_related_names(
55+
text, loinc_names, min_inserts=2, max_inserts=max_inserts
56+
)
57+
assert result == expected

0 commit comments

Comments
 (0)