Description
When a test uses unittest.mock.patch.dict(os.environ, {...}, clear=True), the
generated trampoline reads MUTANT_UNDER_TEST from os.environ, finds it
empty, and forwards the call to the original function. Mutants reachable
only through such tests are then falsely reported as survived. This silently
lowers the mutation score of any project whose tests scrub the environment
(Docker / runtime detection, default-paths logic, env-driven feature flags,
etc.).
Root Cause
_mutmut_trampoline re-reads MUTANT_UNDER_TEST on every call:
mutation/trampoline_templates.py L121-L125
def _mutmut_trampoline(orig, mutants, call_args, call_kwargs, self_arg=None):
"""Forward call to original or mutated function, depending on the environment"""
import os
mutant_under_test = os.environ.get('MUTANT_UNDER_TEST', '')
if not mutant_under_test:
# No mutant being tested - call original function
...
mutmut sets MUTANT_UNDER_TEST once in the forked child
(__main__.py L1052-L1055)
before importing the test target, and the value never legitimately changes
during that child's lifetime. The per-call os.environ.get(...) therefore
serves no functional purpose given that lifecycle, but exposes the trampoline
to test code that scrubs os.environ.
Reproducer
# orchard/runtime.py
def is_in_docker() -> bool:
return os.environ.get("IN_DOCKER", "0") == "1"
# tests/test_runtime.py
def test_is_in_docker():
with patch.dict(os.environ, {"IN_DOCKER": "1"}, clear=True):
assert is_in_docker()
If this is the only test exercising is_in_docker, all of its mutants
("0" → "1", == → !=, etc.) survive, even though the assertion would
catch them in a normal run.
Current Workaround
Projects work around this with a per-test helper that re-injects
MUTANT_UNDER_TEST into the patched env:
def mutmut_safe_env(**extra: str) -> dict[str, str]:
env: dict[str, str] = {}
mut = os.environ.get("MUTANT_UNDER_TEST")
if mut is not None:
env["MUTANT_UNDER_TEST"] = mut
env.update(extra)
return env
# in tests:
with patch.dict(os.environ, mutmut_safe_env(IN_DOCKER="1"), clear=True):
...
It works, but bleeds mutmut implementation details into every test that
needs clear=True.
Proposed Fix: two options
Option A: sticky cache on the trampoline (slightly recommended)
Remember the last non-empty value seen, so a wiped env falls back to it:
def _mutmut_trampoline(orig, mutants, call_args, call_kwargs, self_arg=None):
"""Forward call to original or mutated function, depending on the environment"""
import os
mutant_under_test = os.environ.get('MUTANT_UNDER_TEST', '')
if not mutant_under_test:
mutant_under_test = getattr(_mutmut_trampoline, '_sticky', '')
else:
_mutmut_trampoline._sticky = mutant_under_test
if not mutant_under_test:
# No mutant being tested - call original function
...
A few added lines: one conditional that updates the sticky cache on non-empty
env and falls back to it on empty env. Each non-empty transition refreshes the
cache, so the parent's transitions through "fail", "stats",
"list_all_tests", "mutant_generation", and per-mutant names all keep
working.
Behavioral difference vs. current code. The parent process today
intentionally writes MUTANT_UNDER_TEST="" between phases (after
run_forced_fail_test at __main__.py:640
and after mutant_generation at __main__.py:994)
as a "transition signal" between phases. With Option A, the sticky cache
ignores those resets and keeps reporting the previous mode. This is only
observable if the parent invokes user code through a trampoline between such
a reset and the start of the next phase. It does not affect the per-mutant
test runs themselves: those happen in os.fork()ed children whose env is set
once at fork time and never cleared, so live read and sticky cache are
equivalent there.
Option B: import-time cache + live fallback
Snapshot once at module-import time:
_MUTANT_UNDER_TEST_CACHED = os.environ.get('MUTANT_UNDER_TEST', '')
def _mutmut_trampoline(orig, mutants, call_args, call_kwargs, self_arg=None):
"""Forward call to original or mutated function, depending on the environment"""
mutant_under_test = (
_MUTANT_UNDER_TEST_CACHED
or os.environ.get('MUTANT_UNDER_TEST', '')
)
if not mutant_under_test:
...
Faster than Option A (skips the per-call os.environ.get once the cache is
populated, measurable on suites with tens of thousands of trampoline
crossings). Relies on the trampoline impl being imported after
MUTANT_UNDER_TEST is set, which is true today but is a stricter assumption
than Option A.
Verified in a real codebase
Reproduced and fixed in orchard-ml
on orchard/core/environment/hardware.py, where every test for
configure_system_libraries uses patch.dict(os.environ, ..., clear=True):
| Setup |
Killed |
Survived |
Score |
mutmut_safe_env workaround active |
129 / 133 |
4 |
97.0 % |
| Workaround disabled, trampoline unchanged |
114 / 133 |
19 |
85.7 % |
| Workaround disabled, Option A patched |
129 / 133 |
4 |
97.0 % |
15 mutants were falsely surviving without the workaround. Option A restores
the score exactly without it.
Recommendation
Both options have tradeoffs.
Option A is minimal and works correctly in the fork-and-test path. Its
risk is parent-side: if user code is invoked through a trampoline between an
"" reset and the next phase, the sticky cache would still report the
previous mode. I have not been able to find a code path in mutmut that
actually triggers this today, but the assumption is worth pointing out.
Option B is faster but has a sharper edge case: if the parent imports
trampoline_impl while MUTANT_UNDER_TEST is set to a non-mutant value
("stats", "fail", etc.), the import-time cache freezes that value, and
forked children inherit it via copy-on-write. Their per-mutant env would then
be ignored. Option B is therefore only safe if trampoline_impl is
guaranteed to be imported after fork and after the env is set, which is not
the case today.
On balance I'd lean toward Option A, but I'm happy to implement either
(or a different approach you prefer).
Description
When a test uses
unittest.mock.patch.dict(os.environ, {...}, clear=True), thegenerated trampoline reads
MUTANT_UNDER_TESTfromos.environ, finds itempty, and forwards the call to the original function. Mutants reachable
only through such tests are then falsely reported as
survived. This silentlylowers the mutation score of any project whose tests scrub the environment
(Docker / runtime detection, default-paths logic, env-driven feature flags,
etc.).
Root Cause
_mutmut_trampolinere-readsMUTANT_UNDER_TESTon every call:mutation/trampoline_templates.pyL121-L125mutmut sets
MUTANT_UNDER_TESTonce in the forked child(
__main__.pyL1052-L1055)before importing the test target, and the value never legitimately changes
during that child's lifetime. The per-call
os.environ.get(...)thereforeserves no functional purpose given that lifecycle, but exposes the trampoline
to test code that scrubs
os.environ.Reproducer
If this is the only test exercising
is_in_docker, all of its mutants(
"0" → "1",== → !=, etc.) survive, even though the assertion wouldcatch them in a normal run.
Current Workaround
Projects work around this with a per-test helper that re-injects
MUTANT_UNDER_TESTinto the patched env:It works, but bleeds mutmut implementation details into every test that
needs
clear=True.Proposed Fix: two options
Option A: sticky cache on the trampoline (slightly recommended)
Remember the last non-empty value seen, so a wiped env falls back to it:
A few added lines: one conditional that updates the sticky cache on non-empty
env and falls back to it on empty env. Each non-empty transition refreshes the
cache, so the parent's transitions through
"fail","stats","list_all_tests","mutant_generation", and per-mutant names all keepworking.
Behavioral difference vs. current code. The parent process today
intentionally writes
MUTANT_UNDER_TEST=""between phases (afterrun_forced_fail_testat__main__.py:640and after
mutant_generationat__main__.py:994)as a "transition signal" between phases. With Option A, the sticky cache
ignores those resets and keeps reporting the previous mode. This is only
observable if the parent invokes user code through a trampoline between such
a reset and the start of the next phase. It does not affect the per-mutant
test runs themselves: those happen in
os.fork()ed children whose env is setonce at fork time and never cleared, so live read and sticky cache are
equivalent there.
Option B: import-time cache + live fallback
Snapshot once at module-import time:
Faster than Option A (skips the per-call
os.environ.getonce the cache ispopulated, measurable on suites with tens of thousands of trampoline
crossings). Relies on the trampoline impl being imported after
MUTANT_UNDER_TESTis set, which is true today but is a stricter assumptionthan Option A.
Verified in a real codebase
Reproduced and fixed in orchard-ml
on
orchard/core/environment/hardware.py, where every test forconfigure_system_librariesusespatch.dict(os.environ, ..., clear=True):mutmut_safe_envworkaround active15 mutants were falsely surviving without the workaround. Option A restores
the score exactly without it.
Recommendation
Both options have tradeoffs.
Option A is minimal and works correctly in the fork-and-test path. Its
risk is parent-side: if user code is invoked through a trampoline between an
""reset and the next phase, the sticky cache would still report theprevious mode. I have not been able to find a code path in mutmut that
actually triggers this today, but the assumption is worth pointing out.
Option B is faster but has a sharper edge case: if the parent imports
trampoline_implwhileMUTANT_UNDER_TESTis set to a non-mutant value(
"stats","fail", etc.), the import-time cache freezes that value, andforked children inherit it via copy-on-write. Their per-mutant env would then
be ignored. Option B is therefore only safe if
trampoline_implisguaranteed to be imported after fork and after the env is set, which is not
the case today.
On balance I'd lean toward Option A, but I'm happy to implement either
(or a different approach you prefer).