Skip to content

Commit 347e3e3

Browse files
authored
Update docs to recommend subdir (#1602)
1 parent 8009b22 commit 347e3e3

File tree

1 file changed

+75
-34
lines changed

1 file changed

+75
-34
lines changed

docs/kb/semgrep-ci/scan-monorepo-in-parts.md

Lines changed: 75 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -19,61 +19,102 @@ As such, it can be helpful to scan a monorepo in parts for multiple reasons:
1919

2020
When scanning a repo with Semgrep in CI, the base command is `semgrep ci`. To understand this default setup for your source code manager (SCM) and CI provider, see [Getting started with Semgrep in continuous integration (CI)](/deployment/add-semgrep-to-ci).
2121

22-
To split up your monorepo, you need to make two changes. First, use the `--include` flag to determine *how* you want to logically split up the code. Second, update the `SEMGREP_REPO_DISPLAY_NAME` environment variable to assign findings to separate projects in Semgrep AppSec Platform.
22+
There are two features provided by Semgrep to split up a repo. Consider a monorepo named `monorepo` with four main modules
2323

24-
For example, if the monorepo has four main modules and their paths are:
25-
```
26-
src/moduleA
27-
src/moduleB
28-
src/moduleC
29-
src/moduleD
30-
```
24+
/src/moduleA
25+
/src/moduleB
26+
/src/moduleC
27+
/src/moduleD
3128

32-
Then splitting its scans into four separate scans, one for each module, would provide a logical separation for findings. In general, we recommend that modules not exceed ~100,000 lines of code in order to maintain optimal scan time and efficiency.
29+
The easiest way to split this monorepo up is into four separate scans, one for each module. To do this, use the `--subdir` (see `semgrep ci --help`) flag with the relevant path to only scan files in that module's code path:
3330

34-
After choosing a logical split, use the `--include` flag ([see CLI reference](/docs/cli-reference)) with the relevant path to only scan files in that module's code path:
31+
semgrep ci --subdir /src/moduleA/*
3532

36-
```
37-
semgrep ci --include=src/moduleA/**
38-
```
33+
In addition to scanning `/src/moduleA/*`, this command sends the results to a project called `monorepo/src/moduleA`. If you want to change the project name, set the `SEMGREP_REPO_DISPLAY_NAME` environment variable, available since Semgrep version 1.61.1.
3934

40-
Now, Semgrep is only scanning files under that path and the CI run will take less time, since less code is being scanned.
35+
For example:
4136

42-
For the other modules, the commands look similar. For module B:
37+
SEMGREP_REPO_DISPLAY_NAME=monorepo/moduleA semgrep ci --subdir /src/moduleA/*
4338

44-
```
45-
semgrep ci --include=src/moduleB/**
46-
```
47-
48-
You will then have the flexibility to trigger each one on appropriate events or frequencies.
39+
It is important that scans of different versions never have the same `SEMGREP_REPO_DISPLAY_NAME`. This is necessary to ensure findings have a consistent status and is helpful for developers and security engineers to understand which findings pertain to the module that they are responsible for.
4940

50-
Now that you understand how to configure your monorepo to be scanned in parts, you also have to understand how to configure the findings from each part or module to show up as their own project in Semgrep AppSec Platform.
51-
52-
To assign findings from the module to their own project in Semgrep AppSec Platform, you must explicitly set the `SEMGREP_REPO_DISPLAY_NAME` environment variable, which only works with Semgrep versions 1.61.1 and later ([see CI environment variables reference](/docs/semgrep-ci/ci-environment-variables#semgrep_repo_display_name)).
41+
To scan the entire monorepo, trigger one scan for each module.
5342

5443
:::info
55-
Ensure that `SEMGREP_REPO_NAME` is still properly set (either automatically if using a [supported SCM and CI provider](/docs/semgrep-ci/sample-ci-configs#feature-support) or [explicitly](/docs/semgrep-ci/ci-environment-variables#semgrep_repo_name)) as with any Semgrep scan, in order to retain hyperlink and PR/MR comment functionality.
44+
You must only change `SEMGREP_REPO_DISPLAY_NAME`. Ensure that `SEMGREP_REPO_NAME` is still properly set (either automatically if using a [supported SCM and CI provider](/docs/semgrep-ci/sample-ci-configs#feature-support) or [explicitly](/docs/semgrep-ci/ci-environment-variables#semgrep_repo_name)) as with any Semgrep scan, in order to retain hyperlink and PR/MR comment functionality.
5645
:::
5746

58-
For example, if your monorepo is located at `https://github.com/semgrep/monorepo` the `SEMGREP_REPO_DISPLAY_NAME` would default to the value of `SEMGREP_REPO_NAME`, which in this case is `semgrep/monorepo`. To split the monorepo into four projects corresponding to the logical modules, set `SEMGREP_REPO_NAME` as you normally would while setting `SEMGREP_REPO_DISPLAY_NAME` to a relevant name before running Semgrep:
47+
The `--subdir` flag takes, as input, only a single folder. If you want to scan multiple folders as part of one scan, you will have to use `--include` and `--exclude` ([see CLI reference](/docs/cli-reference)) to instruct Semgrep what paths to include. This performs file targeting across the whole monorepo. but only analyzes the included files.
5948

60-
```
61-
export SEMGREP_REPO_DISPLAY_NAME="semgrep/monorepo/moduleA"
62-
```
63-
And then run Semgrep as demonstrated earlier:
49+
Unlike `--subdir`, `--include` and `--exclude` don't automatically direct results to a corresponding project, so you always have to set `SEMGREP_REPO_DISPLAY_NAME`.
6450

65-
```
66-
semgrep ci --include=src/moduleA/**
67-
```
51+
Here's an example using `--include`.
52+
53+
SEMGREP_REPO_DISPLAY_NAME=monorepo/moduleAB semgrep ci --include=/src/moduleA/* --include=/src/moduleB/*
6854

69-
Now, the findings from this CI run will show up in their own project in Semgrep AppSec Platform named `semgrep/monorepo/moduleA`. This is not only necessary to ensure findings have a consistent status, but also helpful so that developers and security engineers can have a clearer understanding of which findings pertain to the module that they are responsible for.
55+
:::info
56+
WARNING: if `--include` and `--exclude` are used in a `semgrep ci` scan without setting `SEMGREP_REPO_DISPLAY_NAME`, `semgrep ci` might close findings that aren't detected in those scans.
57+
:::
7058

71-
### Example using GitHub Actions
59+
### Examples using GitHub Actions
7260

7361
Below, you will find an example GitHub Actions workflow file. This is 1 of 4 workflow files you would need for this specific example, all placed in the monorepo's `.github/workflows/` folder. Each workflow file corresponds to a module of the monorepo you would like to scan and treat as a separate project in Semgrep AppSec Platform.
7462

7563
You can name each workflow file whatever you like, but it may be helpful to name it after the module it corresponds to. In this example, something like `semgrep_moduleA.yml` would be ideal.
7664

65+
#### With --subdir
66+
67+
```yaml
68+
# Name of this GitHub Actions workflow.
69+
name: Semgrep - moduleA
70+
71+
on:
72+
# Scan on-demand through GitHub Actions interface:
73+
workflow_dispatch: {}
74+
# Scan changed files in PRs (diff-aware scanning):
75+
pull_request:
76+
# Restrict the workflow to only run for files changed in a PR at the desired module path:
77+
paths:
78+
- 'src/moduleA/**'
79+
# Run a full scan when the Semgrep workflow file is changed:
80+
push:
81+
paths:
82+
- '.github/workflows/semgrep_moduleA.yml'
83+
# Schedule a daily full scan CI job (this method uses cron syntax):
84+
schedule:
85+
- cron: '20 17 * * *' # Sets Semgrep to scan every day at 17:20 UTC.
86+
# It is recommended to change the schedule to a random time.
87+
88+
jobs:
89+
semgrep:
90+
# User definable name of this GitHub Actions job.
91+
name: semgrep/ci
92+
# If you are self-hosting, change the following `runs-on` value:
93+
runs-on: ubuntu-latest
94+
95+
container:
96+
# A Docker image with Semgrep installed. Do not change this.
97+
image: semgrep/semgrep
98+
99+
# Skip any PR created by dependabot to avoid permission issues:
100+
if: (github.actor != 'dependabot[bot]')
101+
102+
steps:
103+
# Fetch project source with GitHub Actions Checkout. Use either v3 or v4.
104+
- uses: actions/checkout@v4
105+
# Run the "semgrep ci" command on the command line of the docker image.
106+
- run: semgrep ci --subdir=src/moduleA/
107+
env:
108+
# Connect to Semgrep AppSec Platform through your SEMGREP_APP_TOKEN.
109+
# Generate a token from Semgrep AppSec Platform > Settings
110+
# and add it to your GitHub secrets.
111+
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
112+
```
113+
114+
115+
#### With --include
116+
117+
77118
```yaml
78119
# Name of this GitHub Actions workflow.
79120
name: Semgrep - moduleA

0 commit comments

Comments
 (0)