Skip to content

Commit 92166cd

Browse files
authored
move data from internal to user feedback dataset (#1924)
* move data from internal to user feedback dataset * correct b to strong
1 parent ce8a8c4 commit 92166cd

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

docs/semgrep-assistant/metrics.md

+17-17
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,16 @@ tags:
1010

1111
# Semgrep Assistant metrics and methodology
1212

13-
Semgrep's metrics for evaluating Semgrep Assistant's performance are derived from two sources:
13+
Metrics for evaluating Semgrep Assistant's performance are derived from two sources:
1414

1515
- **User feedback** on Assistant recommendations within the product
1616
- **Internal triage and benchmarking** conducted by Semgreps security research team
1717

1818
This methodology ensures that Assistant is evaluated from both a user's and expert's perspective. This gives Semgrep's product and engineering teams a holistic view into Assistant's real-world performance.
1919

20-
## User feedback (real-world dataset)
20+
## User feedback
2121

22-
User feedback shows the aggregated and anonymized performance of Assistant across **more than 1000 customers**, providing a comprehensive real-world dataset.
22+
User feedback shows the aggregated and anonymized performance of Assistant across **more than 1000 customers**, providing a comprehensive **real-world dataset**.
2323

2424
Users are prompted in-line to "thumbs up" or "thumbs down" Assistant suggestions as they receive Assistant suggestions in their PR or MR. This ensures that sampling bias is reduced, as both developers and AppSec engineers can provide feedback.
2525

@@ -28,23 +28,27 @@ Users are prompted in-line to "thumbs up" or "thumbs down" Assistant suggestions
2828
<table>
2929
<tr>
3030
<td>Customers in dataset</td>
31-
<td><b>1000+</b></td>
31+
<td><strong>1000+</strong></td>
3232
</tr>
3333
<tr>
3434
<td>Findings analyzed</td>
35-
<td><b>250,000+</b></td>
35+
<td><strong>250,000+</strong></td>
36+
</tr>
37+
<tr>
38+
<td>Average reduction in findings[^1]</td>
39+
<td><strong>20%</strong></td>
3640
</tr>
3741
<tr>
3842
<td>Human-agree rate</td>
39-
<td><b>92%</b></td>
43+
<td><strong>92%</strong></td>
4044
</tr>
4145
<tr>
4246
<td>Median time to resolution</td>
43-
<td><b>22% faster than baseline</b></td>
47+
<td><strong>22% faster than baseline</strong></td>
4448
</tr>
4549
<tr>
4650
<td>Average time saved per finding</td>
47-
<td><b>30 minutes</b></td>
51+
<td><strong>30 minutes</strong></td>
4852
</tr>
4953
</table>
5054

@@ -57,24 +61,20 @@ Internal benchmarks for Assistant run on the same dataset used by Semgrep's secu
5761
<table>
5862
<tr>
5963
<td>Findings analyzed</td>
60-
<td><b>2000+</b></td>
61-
</tr>
62-
<tr>
63-
<td>Average reduction in findings[^1]</td>
64-
<td><b>20%</b></td>
64+
<td><strong>2000+</strong></td>
6565
</tr>
6666
<tr>
6767
<td>False positive confidence rate[^2]</td>
68-
<td><b>96%</b></td>
68+
<td><strong>96%</strong></td>
6969
</tr>
7070
<tr>
7171
<td>Remediation guidance confidence rate[^3]</td>
72-
<td><b>80%</b></td>
72+
<td><strong>80%</strong></td>
7373
</tr>
7474
</table>
7575

76-
[^1]:The average % of SAST findings that Assistant filters out as noise.
76+
[^1]:The average % of SAST findings that Assistant filters out as noise.
7777

7878
[^2]:False positive confidence rate measures how often Assistant is correct when it identifies a false positive. **A high confidence rate means users can trust when Assistant identifies a false positive - it does not mean that Assistant catches all false positives.**
7979

80-
[^3]:Remediation guidance is rated on a binary scale of "helpful" / "not helpful".
80+
[^3]:Remediation guidance is rated on a binary scale of "helpful" / "not helpful".

0 commit comments

Comments
 (0)