Skip to content

Commit

Permalink
[KB]: Add self hosted config options for Azure Pipelines + Ubuntu (#1887
Browse files Browse the repository at this point in the history
)

---------
Co-authored-by: s-santillan <[email protected]>
  • Loading branch information
armchairlinguist authored Jan 17, 2025
1 parent 7b37c64 commit 82cd687
Show file tree
Hide file tree
Showing 5 changed files with 165 additions and 33 deletions.
134 changes: 134 additions & 0 deletions docs/kb/semgrep-ci/azure-self-hosted-ubuntu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
tags:
- Azure Pipelines
description: Run Semgrep on self-hosted Ubuntu runners in Azure DevOps.
---
import AzureVariables from "/src/components/procedure/_set-env-vars-azure.mdx"

# Semgrep with self-hosted Ubuntu runners in Azure Pipelines

Semgrep provides a [sample configuration for Azure-hosted runners](/docs/semgrep-ci/sample-ci-configs#azure-pipelines). If you use self-hosted Ubuntu Linux runners, you have significantly more control over their configuration, but as a result, they require additional preparation and configuration to run Semgrep.

This guide adds two approaches to configuring self-hosted runners that use Ubuntu (the default self-hosted option for Azure DevOps Linux runners):

* [Using pipx](#using-pipx)
* [Using pip with a virtual environment](#using-pip-with-a-virtual-environment)

## Using pipx

While the sample configuration uses `pip`, this approach uses `pipx`, which avoids issues with system-managed Python vs user-installed Python.

### Prepare your runner

Access the runner and execute the following commands:

```bash
$ sudo apt update
$ sudo apt install pipx
$ pipx ensurepath
```

After completing the commands:

1. Start a new shell session, so that the changes from `pipx ensurepath` are available.
2. Ensure the [Azure DevOps agent](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/linux-agent?view=azure-devops) is set up and running.

### Create your configuration

1. Follow the steps provided in the [sample configuration for Azure-hosted runners](/docs/semgrep-ci/sample-ci-configs#azure-pipelines).
2. Add the following snippet to the `azure-pipelines.yml` for the repository.

```yaml
variables:
- group: Semgrep_Variables

pool:
name: Default

steps:
- checkout: self
clean: true
fetchDepth: 20
persistCredentials: true
- script: |
pipx install semgrep
if [ $(Build.SourceBranchName) = "master" ]; then
echo "Semgrep full scan"
semgrep ci
elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
echo "Semgrep diff scan"
git fetch origin master:origin/master
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
semgrep ci
fi
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
```
:::info Customizing the configuration
* If your self-hosted runner [agent pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/pools-queues?view=azure-devops&tabs=yaml%2Cbrowser) has a different name, update the `name` key under `pool` to match the desired agent pool.
* If your default branch is not called `master`, update the references to `master` to match the name of your default branch.
:::

<AzureVariables />

## Using pip with a virtual environment

### Prepare your runner

This approach uses built-in Azure DevOps tasks, including `UsePythonVersion` and `Bash`, and uses a virtual environment to install `pip`, another approach that prevents issues with system-managed Python vs user-installed Python.

1. Ensure you have a pre-installed and configured compatible version of Python 3, following [the instructions for UsePythonVersion for self-hosted runners](https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/use-python-version-v0?view=azure-pipelines#how-can-i-configure-a-self-hosted-agent-to-use-this-task).
2. Ensure the [Azure DevOps agent](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/linux-agent?view=azure-devops) is set up and running.

### Create your configuration

Add the following snippet to the `azure-pipelines.yml` for the repository.


```yaml
variables:
- group: Semgrep_Variables
pool:
name: Default
steps:
- checkout: self
clean: true
persistCredentials: true
- task: UsePythonVersion@0
displayName: 'Use Python 3.12'
inputs:
versionSpec: 3.12
- task: Bash@3
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
inputs:
targetType: 'inline'
script: |
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
pip install semgrep
if [ $(Build.SourceBranchName) = "master" ]; then
export SEMGREP_BRANCH=$(Build.SourceBranchName)
echo "Semgrep full scan of master"
semgrep ci
elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
echo "Semgrep diff scan"
git fetch origin master:origin/master
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
semgrep ci
fi
```

:::info Customizing the configuration
* If your self-hosted runner [agent pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/pools-queues?view=azure-devops&tabs=yaml%2Cbrowser) has a different name, update the `name` key under `pool` to match the desired agent pool.
* If your default branch is not called `master`, update the references to `master` to match the name of your default branch.
:::

<AzureVariables />
26 changes: 14 additions & 12 deletions docs/semgrep-ci/sample-ci-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ import CircleCiSemgrepOssSast from "/src/components/code_snippets/_circleci-semg
<!-- Azure Pipelines -->
import AzureSemgrepAppSast from "/src/components/code_snippets/_azure-semgrep-app-sast.mdx"
import AzureSemgrepOssSast from "/src/components/code_snippets/_azure-semgrep-oss-sast.mdx"
import AzureVariables from "/src/components/procedure/_set-env-vars-azure.mdx"


import ScmFeatureReference from "/src/components/reference/_scm-feature-reference.md"

Expand Down Expand Up @@ -88,7 +90,7 @@ If you are self-hosting your repository, you must [use a self-hosted runner](htt
<TabItem value='gha-semgrep'>

The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.

<GhaSemgrepAppSast />

Expand Down Expand Up @@ -152,7 +154,7 @@ To add a Semgrep configuration snippet in your GitLab CI/CD pipeline:

<TabItem value='glcicd-semgrep'>

The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.

<GlcicdSemgrepAppSast />

Expand Down Expand Up @@ -213,7 +215,7 @@ To add a Semgrep configuration snippet in your Jenkins pipeline:
For SCA scans (Semgrep Supply Chain): users of Jenkins UI with the Git plugin must also set up their branch information. See [Setting up Semgrep Supply Chain with Jenkins UI](/semgrep-supply-chain/setup-jenkins-ui) for more information.
:::

The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.

<JenkinsSemgrepAppSast />

Expand Down Expand Up @@ -271,7 +273,7 @@ These steps can also be performed through Bitbucket's UI wizard. This UI wizard

<TabItem value='bitbucket-semgrep'>

The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.

<BitbucketSemgrepAppSast />

Expand Down Expand Up @@ -404,7 +406,7 @@ For the default branch and tags, CircleCI always runs the Semgrep CI job on all

<TabItem value='circleci-semgrep'>

The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.

<CircleCiSemgrepAppSast />

Expand Down Expand Up @@ -432,15 +434,13 @@ Scanning a project with the `semgrep ci` command requires the project to be vers
To add Semgrep into Azure Pipelines:

1. Access the YAML pipeline editor within Azure Pipelines by following the [YAML pipeline editor](https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/yaml-pipeline-editor?view=azure-devops#edit-a-yaml-pipeline) guide.
2. Copy the relevant code snippet provided in [Sample Azure Pipelines configuration snippet](#sample-azure-pipelines-configuration-snippet) into the Azure Pipelines YAML editor.
2. Copy the code snippet provided in [Sample Azure Pipelines configuration snippet](#sample-azure-pipelines-configuration-snippet) into the Azure Pipelines YAML editor.
3. Save the code snippet.
4. Set [environment variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables).
5. Group the environment variables as a [variable group](https://learn.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=classic).
6. Optional: Create a separate CI job for diff-aware scanning, which scans only changed files in PRs or MRs, by repeating steps 1-4 and adding `SEMGREP_BASELINE_REF` as an environment variable.
4. Follow any additional instructions provided with the snippet.

### Sample Azure Pipelines configuration snippet

This configuration snippet is tested with hosted Azure runners. If you are using self-hosted runners, you may need to make adjustments to ensure that the necessary software is available.
This configuration snippet is tested with **hosted** Azure runners. If you are using self-hosted runners, you may need to make adjustments to ensure that the necessary software is available. Consult [Semgrep with self-hosted Ubuntu runners in Azure Pipelines](/docs/kb/semgrep-ci/azure-self-hosted-ubuntu) for two recommended options.

<Tabs
defaultValue="azure-semgrep"
Expand All @@ -452,12 +452,14 @@ This configuration snippet is tested with hosted Azure runners. If you are using

<TabItem value='azure-semgrep'>

The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.

<AzureSemgrepAppSast />

You can **run specific product scans** by passing an argument, such as `--supply-chain`. View the [list of arguments](/getting-started/cli/#scan-using-specific-semgrep-products).

<AzureVariables />

</TabItem>

<TabItem value='azure-oss'>
Expand All @@ -475,7 +477,7 @@ You can customize the scan by entering custom rules or other rulesets to scan wi

To run Semgrep CI on any other provider, use the `semgrep/semgrep` image, and run the `semgrep ci` command with `SEMGREP_BASELINE_REF` set for diff-aware scanning.

**Note**: If you need to use a different image than docker, install Semgrep CI by `pip install semgrep`.
**Note**: If you need to use a different Docker image or are not running in Docker, install Semgrep CI by `pip install semgrep`.

By setting various [CI environment variables](/semgrep-ci/ci-environment-variables), you can run Semgrep in the following CI providers:

Expand Down
22 changes: 3 additions & 19 deletions src/components/code_snippets/_azure-semgrep-app-sast.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,8 @@ steps:
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
git fetch origin master:origin/master
semgrep ci
semgrep ci
fi
env:
SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
```
### Setting environment variables in Azure Pipelines
Set these variables within Azure Pipelines UI following the steps in [Environment variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables):
* `SEMGREP_APP_TOKEN`

Set these environment variables to troubleshoot the links to the code that generated a finding or if you are not receiving PR or MR comments:

* `SEMGREP_JOB_URL`
* `SEMGREP_COMMIT`
* `SEMGREP_BRANCH`
* `SEMGREP_REPO_URL`
* `SEMGREP_REPO_NAME`

Set this environment variable for diff-aware scanning:

* `SEMGREP_BASELINE_REF`. Its value is typically your trunkline branch, such as `main` or `master`.
2 changes: 0 additions & 2 deletions src/components/code_snippets/_azure-semgrep-oss-sast.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
```yaml
variables:
- group: Semgrep_Variables

steps:
- checkout: self
Expand Down
14 changes: 14 additions & 0 deletions src/components/procedure/_set-env-vars-azure.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Set environment variables in Azure Pipelines

Semgrep minimally requires the variable `SEMGREP_APP_TOKEN` in order to report results to the platform, and other variables may be helpful as well. To set these variables in Azure Pipelines:

1. Set up a [variable group](https://learn.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=classic) called `Semgrep_Variables`.
2. Set `SEMGREP_APP_TOKEN` in the variable group, following the steps for [secret variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-secret-variables?view=azure-devops&tabs=yaml%2Cbash#set-a-secret-variable-in-a-variable-group). The variable is mapped into the `env` in the provided config.
3. Optional: Add the following environment variables to the group if you aren't seeing hyperlinks to the code that generated a finding, or if you are not receiving PR or MR comments. Review the use of these variables at [Environment variables for creating hyperlinks in Semgrep AppSec Platform](https://semgrep.dev/docs/semgrep-ci/ci-environment-variables#environment-variables-for-creating-hyperlinks-in-semgrep-appsec-platform).These variables are not sensitive and do not need to be secret variables.
* `SEMGREP_REPO_NAME`
* `SEMGREP_REPO_URL`
* `SEMGREP_BRANCH`
* `SEMGREP_COMMIT`
* `SEMGREP_JOB_URL`
4. Set variables for diff-aware scanning. The provided config sets `SEMGREP_PR_ID` to the system variable `System.PullRequest.PullRequestId` and `SEMGREP_BASELINE_REF` to `origin/master` within the `script` section of the config. The value of `SEMGREP_BASELINE_REF` is typically your trunk or default branch, so if you use a different branch than master, update the name accordingly. as `main` or `master`.
* If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the `elif` section of the `script` step.

0 comments on commit 82cd687

Please sign in to comment.