Skip to content

Commit 52597db

Browse files
authored
refactor: optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences (#295)
* refactor: optimize KEEP action performance by caching field contenders When multiple KEEP actions are present in a deid recipe, the `expand_field_expression` function was internally calling `get_fields_with_lookup(dicom)` for each KEEP action. This function iterates through all DICOM fields and builds lookup tables, which is an expensive operation. This commit modifies the `keep` property to build the field contenders once on the first KEEP action and pass it explicitly to all subsequent `expand_field_expression` calls via the `contenders` parameter. This avoids redundant field enumeration and lookup table construction. This optimization significantly reduces processing time for recipes with multiple KEEP actions, especially for DICOM files with many fields or nested sequences. * fix: resolve SyntaxWarning for invalid escape sequences Convert regex patterns to raw strings (r"") in config/utils.py to eliminate SyntaxWarning about invalid escape sequences. This follows Python best practices for regular expressions and avoids potential issues with escape sequences.
1 parent 0609cf2 commit 52597db

File tree

4 files changed

+13
-4
lines changed

4 files changed

+13
-4
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ and **Merged pull requests**. Critical items to know are:
1414
Referenced versions in headers are tagged on Github, in parentheses are for pypi.
1515

1616
## [vxx](https://github.com/pydicom/deid/tree/master) (master)
17+
- Optimize KEEP action performance by caching field contenders & fix SyntaxWarning for invalid escape sequences [#293](https://github.com/pydicom/deid/pull/295) (0.4.10)
1718
- Fix field removal and blanking to clean up child UID references [#293](https://github.com/pydicom/deid/pull/293) (0.4.9)
1819
- Fix UID lookup for nested sequence fields in DICOM datasets [#292](https://github.com/pydicom/deid/pull/292) (0.4.8)
1920
- Allow saving with a compressed transfer syntax [#290](https://github.com/pydicom/deid/pull/290) (0.4.7)

deid/config/utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ def load_deid(path=None):
130130
parts = line.split(" ")
131131
if len(parts) > 1:
132132
section_name = " ".join(parts[1:])
133-
section = re.sub("[%]|(\s+)", "", parts[0]).lower() # noqa
133+
section = re.sub(r"[%]|(\s+)", "", parts[0]).lower() # noqa
134134
if section not in sections:
135135
bot.exit("%s is not a valid section." % section)
136136

@@ -219,7 +219,7 @@ def parse_format(line):
219219
==========
220220
line: the line that starts with format.
221221
"""
222-
fmt = re.sub("FORMAT|(\s+)", "", line).lower() # noqa
222+
fmt = re.sub(r"FORMAT|(\s+)", "", line).lower() # noqa
223223
if fmt not in formats:
224224
bot.exit("%s is not a valid format." % fmt)
225225
bot.debug("FORMAT set to %s" % fmt)

deid/dicom/parser.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -320,10 +320,18 @@ def keep(self):
320320
"""
321321
keeps = []
322322
if self.recipe.deid is not None:
323+
# Build field contenders ONCE and reuse for all KEEP actions
324+
contenders = None
323325
for action in self.recipe.get_actions(action="KEEP"):
324326
if action and action.get("field"):
327+
# Only build contenders on first iteration
328+
if contenders is None:
329+
contenders = get_fields_with_lookup(self.dicom)
330+
325331
fields = expand_field_expression(
326-
field=action.get("field"), dicom=self.dicom
332+
field=action.get("field"),
333+
dicom=self.dicom,
334+
contenders=contenders, # Reuse the same contenders
327335
)
328336
# keys are in the format "(1234,5678)"
329337
keeps.extend(fields.keys())

deid/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
__copyright__ = "Copyright 2016-2025, Vanessa Sochat"
33
__license__ = "MIT"
44

5-
__version__ = "0.4.9"
5+
__version__ = "0.4.10"
66
AUTHOR = "Vanessa Sochat"
77
AUTHOR_EMAIL = "vsoch@users.noreply.github.com"
88
NAME = "deid"

0 commit comments

Comments
 (0)