Skip to content

Conversation

@mhucka
Copy link
Contributor

@mhucka mhucka commented Sep 24, 2025

CodeQL scan report #566 flagged a regex used on line 165 of
src/openfermion/ops/operators/symbolic_operator.py as being potential subject to a DoS attach. The warning is this:

pattern = r'(.*?)\[(.*?)\]'  # regex for a term
for match in re.findall(pattern, long_string, flags=re.DOTALL):
This regular expression that depends on a user-provided value
 may run slow on strings with many repetitions of 'a'.
This regular expression that depends on a user-provided value
 may run slow on strings starting with '[' and with many repetitions of '[a'.
This regular expression that depends on a user-provided value
 may run slow on strings with many repetitions of 'a'.
This regular expression that depends on a user-provided value
 may run slow on strings starting with '[' and with many repetitions of '[a'.

This changes the regular expression to avoid .* yet still be able to match the same patterns as before. Additional tests in symbolic_operator_test.py verify that this will parse strings correctly.

CodeQL scan [report
quantumlib#566](https://github.com/quantumlib/OpenFermion/security/code-scanning/566)
flagged a regex used on line 165 of
`src/openfermion/ops/operators/symbolic_operator.py` as being potential
subject to a DoS attach. The warning is this:

```python
pattern = r'(.*?)\[(.*?)\]'  # regex for a term
for match in re.findall(pattern, long_string, flags=re.DOTALL):
```

```
This regular expression that depends on a user-provided value
 may run slow on strings with many repetitions of 'a'.
This regular expression that depends on a user-provided value
 may run slow on strings starting with '[' and with many repetitions of '[a'.
This regular expression that depends on a user-provided value
 may run slow on strings with many repetitions of 'a'.
This regular expression that depends on a user-provided value
 may run slow on strings starting with '[' and with many repetitions of '[a'.
```

This changes the regular expression to avoid `.*` yet still be able to
match the same patterns as before. Additional tests in
`symbolic_operator_test.py` verify that this will parse strings
correctly.
@mhucka mhucka marked this pull request as ready for review September 24, 2025 22:35
@mhucka mhucka changed the title Fix securing scan warning in symbolic_operator.py Fix securing scan warning about unsafe regex in symbolic_operator.py Sep 25, 2025
@mhucka mhucka changed the title Fix securing scan warning about unsafe regex in symbolic_operator.py Fix #1120: change unsafe regex in symbolic_operator.py Sep 25, 2025
@mhucka mhucka added the area/health Involves code and/or project health label Sep 26, 2025
@mhucka mhucka added this pull request to the merge queue Sep 26, 2025
Merged via the queue into quantumlib:master with commit 205c5d2 Sep 26, 2025
27 checks passed
@mhucka mhucka deleted the mh-fix-regex branch September 26, 2025 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/health Involves code and/or project health

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants