You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/semgrep-assistant/metrics.md
+9-7
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Metrics for evaluating Semgrep Assistant's performance are derived from two sour
15
15
-**User feedback** on Assistant recommendations within the product
16
16
-**Internal triage and benchmarking** conducted by Semgreps security research team
17
17
18
-
This methodology ensures that Assistant is evaluated from both a user's and expert's perspective. This gives Semgrep's product and engineering teams a holistic view into Assistant's real-world performance.
18
+
This methodology ensures that Assistant is evaluated from both a user's and expert's perspective. This gives Semgrep's product and engineering teams a holistic view into Assistant's real-world performance.[^1]
19
19
20
20
## User feedback
21
21
@@ -35,7 +35,7 @@ Users are prompted in-line to "thumbs up" or "thumbs down" Assistant suggestions
35
35
<td><strong>250,000+</strong></td>
36
36
</tr>
37
37
<tr>
38
-
<td>Average reduction in findings[^1]</td>
38
+
<td>Average reduction in findings[^2]</td>
39
39
<td><strong>20%</strong></td>
40
40
</tr>
41
41
<tr>
@@ -64,17 +64,19 @@ Internal benchmarks for Assistant run on the same dataset used by Semgrep's secu
64
64
<td><strong>2000+</strong></td>
65
65
</tr>
66
66
<tr>
67
-
<td>False positive confidence rate[^2]</td>
67
+
<td>False positive confidence rate[^3]</td>
68
68
<td><strong>96%</strong></td>
69
69
</tr>
70
70
<tr>
71
-
<td>Remediation guidance confidence rate[^3]</td>
71
+
<td>Remediation guidance confidence rate[^4]</td>
72
72
<td><strong>80%</strong></td>
73
73
</tr>
74
74
</table>
75
75
76
-
[^1]:The average % of SAST findings that Assistant filters out as noise.
76
+
[^1]: Learn more about how Semgrep achieved these numbers in [How we built an AppSec AI that security researchers agree with 96% of the time](https://semgrep.dev/blog/2025/building-an-appsec-ai-that-security-researchers-agree-with-96-of-the-time/).
77
77
78
-
[^2]:False positive confidence rate measures how often Assistant is correct when it identifies a false positive. **A high confidence rate means users can trust when Assistant identifies a false positive - it does not mean that Assistant catches all false positives.**
78
+
[^2]:The average % of SAST findings that Assistant filters out as noise.
79
79
80
-
[^3]:Remediation guidance is rated on a binary scale of "helpful" / "not helpful".
80
+
[^3]:False positive confidence rate measures how often Assistant is correct when it identifies a false positive. **A high confidence rate means users can trust when Assistant identifies a false positive - it does not mean that Assistant catches all false positives.**
81
+
82
+
[^4]:Remediation guidance is rated on a binary scale of "helpful" / "not helpful".
0 commit comments