-
-
Notifications
You must be signed in to change notification settings - Fork 50
Expand file tree
/
Copy path.semgrep.yml
More file actions
166 lines (159 loc) · 6.21 KB
/
Copy path.semgrep.yml
File metadata and controls
166 lines (159 loc) · 6.21 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# Project-specific Semgrep rules.
#
# These rules encode invariants Bernstein learned the hard way --
# subprocess spawns that miss the env-filter step, eval/exec smuggled
# into production, etc. Severity contract for the PR job:
#
# ERROR -> fails CI (only the small set we're confident on)
# WARNING -> advisory annotation (won't block but surfaces in PR)
#
# When tightening a rule from WARNING to ERROR: first drive the
# WARNING count to zero in main, then promote.
#
# Path patterns are anchored with ``**/src/...`` for forward-compat with
# Semgrepignore v2. See https://semgrep.dev/docs/release-notes/2025/.
rules:
# ----------------------------------------------------------------------
# Hard bans (ERROR - fails PR)
# ----------------------------------------------------------------------
- id: no-exec-in-production
message: >-
exec() must not appear in src/. Replace with explicit dispatch maps
or call into the bernstein.core.scripting sandbox.
severity: ERROR
languages: [python]
paths:
include:
- "**/src/bernstein/**"
exclude:
- "**/src/bernstein/**/__pycache__/**"
- "**/src/bernstein/**/test_*.py"
pattern: exec(...)
# ----------------------------------------------------------------------
# Strong-warning candidates - currently advisory because main has known
# accepted usages. Promote to ERROR after the refactor PRs land:
# - cache_cmd.py / fingerprint.py / duration_predictor.py: pickle on
# HMAC-verified local files (trust-boundary documented inline)
# - formal_verification.py: eval() inside __builtins__={} sandbox for
# Z3 invariant DSL (design-level decision)
# ----------------------------------------------------------------------
- id: no-eval-in-production
message: >-
eval() in src/. Use ast.literal_eval (for pure literals) or the
bernstein.core.scripting sandboxed evaluator unless you have a
documented trust boundary (e.g. core/quality/formal_verification.py
uses __builtins__={} sandbox for the Z3 invariant DSL).
severity: WARNING
languages: [python]
paths:
include:
- "**/src/bernstein/**"
exclude:
- "**/src/bernstein/**/__pycache__/**"
- "**/src/bernstein/**/test_*.py"
pattern: eval(...)
- id: no-pickle-loads-on-untrusted
message: >-
pickle.loads / pickle.load is unsafe on untrusted input. Bernstein
uses JSON or msgpack with explicit schemas for cross-process state.
Local-file pickle (replay cache, memo cache, duration predictor)
is acceptable when the file is HMAC-verified or operator-managed -
add a `# noqa: pickle-trust` comment with a justification.
severity: WARNING
languages: [python]
paths:
include:
- "**/src/bernstein/**"
exclude:
- "**/src/bernstein/**/test_*.py"
pattern-either:
- pattern: pickle.loads(...)
- pattern: pickle.load(...)
# ----------------------------------------------------------------------
# Hygiene (WARNING - advisory; promote to ERROR once main is clean)
# ----------------------------------------------------------------------
- id: subprocess-popen-without-env-in-spawn-helpers
message: >-
Functions named `_spawn_*` MUST pass env=... when invoking subprocess.
Inheriting the parent environment leaks BERNSTEIN_AUDIT_KEY /
OPENAI_API_KEY / cloud creds into agent child processes. Use
core.security.subprocess_helpers.build_filtered_env() to construct
the env dict.
severity: WARNING
languages: [python]
paths:
include:
- "**/src/bernstein/core/agents/**"
- "**/src/bernstein/core/sandbox/**"
patterns:
- pattern-inside: |
def _spawn_$NAME(...):
...
- pattern-either:
- pattern: subprocess.Popen(...)
- pattern: subprocess.run(...)
- pattern: subprocess.check_output(...)
- pattern: subprocess.check_call(...)
- pattern-not: subprocess.Popen(..., env=$ENV, ...)
- pattern-not: subprocess.run(..., env=$ENV, ...)
- pattern-not: subprocess.check_output(..., env=$ENV, ...)
- pattern-not: subprocess.check_call(..., env=$ENV, ...)
- id: os-environ-direct-in-spawn
message: >-
Functions whose name starts with `_spawn_` should not read os.environ
directly. Route through build_filtered_env so the spawned agent only
sees env vars on the explicit allow-list.
severity: WARNING
languages: [python]
paths:
include:
- "**/src/bernstein/core/agents/**"
- "**/src/bernstein/core/sandbox/**"
pattern-either:
- patterns:
- pattern-inside: |
def _spawn_$NAME(...):
...
- pattern-either:
- pattern: os.environ
- pattern: os.getenv(...)
- pattern: os.environ.get(...)
- pattern: os.environ[$KEY]
- id: prefer-defusedxml
message: >-
xml.etree.ElementTree is vulnerable to billion-laughs DoS. Use
defusedxml.ElementTree (already in the project deps).
severity: WARNING
languages: [python]
paths:
include:
- "**/src/bernstein/**"
exclude:
- "**/src/bernstein/**/test_*.py"
pattern: import xml.etree.ElementTree
- id: requests-without-timeout
message: >-
HTTP calls must pass an explicit timeout=. Default `requests` /
`httpx` calls without timeout can hang indefinitely under network
partition.
severity: WARNING
languages: [python]
paths:
include:
- "**/src/bernstein/**"
exclude:
- "**/src/bernstein/**/test_*.py"
patterns:
- pattern-either:
- pattern: requests.get(...)
- pattern: requests.post(...)
- pattern: requests.put(...)
- pattern: requests.delete(...)
- pattern: httpx.get(...)
- pattern: httpx.post(...)
- pattern-not: requests.get(..., timeout=$T, ...)
- pattern-not: requests.post(..., timeout=$T, ...)
- pattern-not: requests.put(..., timeout=$T, ...)
- pattern-not: requests.delete(..., timeout=$T, ...)
- pattern-not: httpx.get(..., timeout=$T, ...)
- pattern-not: httpx.post(..., timeout=$T, ...)