Skip to content

Commit 8d26fca

Browse files
Simlombvsoch
authored andcommitted
refactor: Optimize field expression parsing with pre-compiled regex
Replace manual character iteration with regex-based splitting in _split_expander_expression(). Pre-compile pattern at module level to avoid redundant compilation across millions of invocations when processing large DICOM datasets with extensive recipes. - Add _EXPANDER_SPLIT_RE compiled pattern for splitting on first colon outside quotes - Handle private tags with colons in creator names (e.g., "Siemens: Thorax/...") - Improve performance
1 parent 7221980 commit 8d26fca

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

deid/dicom/fields.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@
1515

1616
from deid.logger import bot
1717

18+
# Pre-compiled regex patterns for performance (called thousands of times in large DICOM datasets)
19+
_EXPANDER_SPLIT_RE = re.compile(r'^([^:"]*(?:"[^"]*"[^:"]*)*):(.*)$')
20+
1821

1922
class DicomField:
2023
"""
@@ -229,13 +232,11 @@ def _split_expander_expression(field):
229232
230233
Returns a list with 1 or 2 elements: [field] or [expander, expression]
231234
"""
232-
in_quotes = False
233-
for i, char in enumerate(field):
234-
if char == '"':
235-
in_quotes = not in_quotes
236-
elif char == ":" and not in_quotes:
237-
# Found the first colon outside quotes
238-
return [field[:i], field[i + 1 :]]
235+
# Pattern r'^([^:"]*(?:"[^"]*"[^:"]*)*):(.*)$' splits on first colon outside quotes
236+
# Captures: (expander before colon) : (expression after colon)
237+
match = _EXPANDER_SPLIT_RE.match(field)
238+
if match:
239+
return [match.group(1), match.group(2)]
239240

240241
# No colon found outside quotes
241242
return [field]

0 commit comments

Comments
 (0)