diff --git a/docs/kb/semgrep-ci/azure-self-hosted-ubuntu.md b/docs/kb/semgrep-ci/azure-self-hosted-ubuntu.md
new file mode 100644
index 000000000..7e009568a
--- /dev/null
+++ b/docs/kb/semgrep-ci/azure-self-hosted-ubuntu.md
@@ -0,0 +1,134 @@
+---
+tags:
+ - Azure Pipelines
+description: Run Semgrep on self-hosted Ubuntu runners in Azure DevOps.
+---
+import AzureVariables from "/src/components/procedure/_set-env-vars-azure.mdx"
+
+# Semgrep with self-hosted Ubuntu runners in Azure Pipelines
+
+Semgrep provides a [sample configuration for Azure-hosted runners](/docs/semgrep-ci/sample-ci-configs#azure-pipelines). If you use self-hosted Ubuntu Linux runners, you have significantly more control over their configuration, but as a result, they require additional preparation and configuration to run Semgrep.
+
+This guide adds two approaches to configuring self-hosted runners that use Ubuntu (the default self-hosted option for Azure DevOps Linux runners):
+
+* [Using pipx](#using-pipx)
+* [Using pip with a virtual environment](#using-pip-with-a-virtual-environment)
+
+## Using pipx
+
+While the sample configuration uses `pip`, this approach uses `pipx`, which avoids issues with system-managed Python vs user-installed Python.
+
+### Prepare your runner
+
+Access the runner and execute the following commands:
+
+```bash
+$ sudo apt update
+$ sudo apt install pipx
+$ pipx ensurepath
+```
+
+After completing the commands:
+
+1. Start a new shell session, so that the changes from `pipx ensurepath` are available.
+2. Ensure the [Azure DevOps agent](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/linux-agent?view=azure-devops) is set up and running.
+
+### Create your configuration
+
+1. Follow the steps provided in the [sample configuration for Azure-hosted runners](/docs/semgrep-ci/sample-ci-configs#azure-pipelines).
+2. Add the following snippet to the `azure-pipelines.yml` for the repository.
+
+```yaml
+variables:
+- group: Semgrep_Variables
+
+pool:
+ name: Default
+
+steps:
+- checkout: self
+ clean: true
+ fetchDepth: 20
+ persistCredentials: true
+- script: |
+ pipx install semgrep
+ if [ $(Build.SourceBranchName) = "master" ]; then
+ echo "Semgrep full scan"
+ semgrep ci
+ elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
+ echo "Semgrep diff scan"
+ git fetch origin master:origin/master
+ export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
+ export SEMGREP_BASELINE_REF='origin/master'
+ semgrep ci
+ fi
+ env:
+ SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
+```
+
+:::info Customizing the configuration
+* If your self-hosted runner [agent pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/pools-queues?view=azure-devops&tabs=yaml%2Cbrowser) has a different name, update the `name` key under `pool` to match the desired agent pool.
+* If your default branch is not called `master`, update the references to `master` to match the name of your default branch.
+:::
+
+
+
+## Using pip with a virtual environment
+
+### Prepare your runner
+
+This approach uses built-in Azure DevOps tasks, including `UsePythonVersion` and `Bash`, and uses a virtual environment to install `pip`, another approach that prevents issues with system-managed Python vs user-installed Python.
+
+1. Ensure you have a pre-installed and configured compatible version of Python 3, following [the instructions for UsePythonVersion for self-hosted runners](https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/use-python-version-v0?view=azure-pipelines#how-can-i-configure-a-self-hosted-agent-to-use-this-task).
+2. Ensure the [Azure DevOps agent](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/linux-agent?view=azure-devops) is set up and running.
+
+### Create your configuration
+
+Add the following snippet to the `azure-pipelines.yml` for the repository.
+
+
+```yaml
+variables:
+- group: Semgrep_Variables
+
+pool:
+ name: Default
+
+steps:
+ - checkout: self
+ clean: true
+ persistCredentials: true
+ - task: UsePythonVersion@0
+ displayName: 'Use Python 3.12'
+ inputs:
+ versionSpec: 3.12
+ - task: Bash@3
+ env:
+ SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
+ inputs:
+ targetType: 'inline'
+ script: |
+ python3 -m venv .venv
+ source .venv/bin/activate
+ python3 -m pip install --upgrade pip
+ pip install semgrep
+
+ if [ $(Build.SourceBranchName) = "master" ]; then
+ export SEMGREP_BRANCH=$(Build.SourceBranchName)
+ echo "Semgrep full scan of master"
+ semgrep ci
+ elif [ $(System.PullRequest.PullRequestId) -ge 0 ]; then
+ echo "Semgrep diff scan"
+ git fetch origin master:origin/master
+ export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
+ export SEMGREP_BASELINE_REF='origin/master'
+ semgrep ci
+ fi
+```
+
+:::info Customizing the configuration
+* If your self-hosted runner [agent pool](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/pools-queues?view=azure-devops&tabs=yaml%2Cbrowser) has a different name, update the `name` key under `pool` to match the desired agent pool.
+* If your default branch is not called `master`, update the references to `master` to match the name of your default branch.
+:::
+
+
diff --git a/docs/semgrep-ci/sample-ci-configs.md b/docs/semgrep-ci/sample-ci-configs.md
index 3e6067fc0..6e4494abc 100644
--- a/docs/semgrep-ci/sample-ci-configs.md
+++ b/docs/semgrep-ci/sample-ci-configs.md
@@ -46,6 +46,8 @@ import CircleCiSemgrepOssSast from "/src/components/code_snippets/_circleci-semg
import AzureSemgrepAppSast from "/src/components/code_snippets/_azure-semgrep-app-sast.mdx"
import AzureSemgrepOssSast from "/src/components/code_snippets/_azure-semgrep-oss-sast.mdx"
+import AzureVariables from "/src/components/procedure/_set-env-vars-azure.mdx"
+
import ScmFeatureReference from "/src/components/reference/_scm-feature-reference.md"
@@ -88,7 +90,7 @@ If you are self-hosting your repository, you must [use a self-hosted runner](htt
-The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
+The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.
@@ -152,7 +154,7 @@ To add a Semgrep configuration snippet in your GitLab CI/CD pipeline:
-The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
+The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.
@@ -213,7 +215,7 @@ To add a Semgrep configuration snippet in your Jenkins pipeline:
For SCA scans (Semgrep Supply Chain): users of Jenkins UI with the Git plugin must also set up their branch information. See [Setting up Semgrep Supply Chain with Jenkins UI](/semgrep-supply-chain/setup-jenkins-ui) for more information.
:::
-The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
+The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.
@@ -271,7 +273,7 @@ These steps can also be performed through Bitbucket's UI wizard. This UI wizard
-The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
+The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.
@@ -404,7 +406,7 @@ For the default branch and tags, CircleCI always runs the Semgrep CI job on all
-The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
+The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.
@@ -432,15 +434,13 @@ Scanning a project with the `semgrep ci` command requires the project to be vers
To add Semgrep into Azure Pipelines:
1. Access the YAML pipeline editor within Azure Pipelines by following the [YAML pipeline editor](https://learn.microsoft.com/en-us/azure/devops/pipelines/get-started/yaml-pipeline-editor?view=azure-devops#edit-a-yaml-pipeline) guide.
-2. Copy the relevant code snippet provided in [Sample Azure Pipelines configuration snippet](#sample-azure-pipelines-configuration-snippet) into the Azure Pipelines YAML editor.
+2. Copy the code snippet provided in [Sample Azure Pipelines configuration snippet](#sample-azure-pipelines-configuration-snippet) into the Azure Pipelines YAML editor.
3. Save the code snippet.
-4. Set [environment variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables).
-5. Group the environment variables as a [variable group](https://learn.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=classic).
-6. Optional: Create a separate CI job for diff-aware scanning, which scans only changed files in PRs or MRs, by repeating steps 1-4 and adding `SEMGREP_BASELINE_REF` as an environment variable.
+4. Follow any additional instructions provided with the snippet.
### Sample Azure Pipelines configuration snippet
-This configuration snippet is tested with hosted Azure runners. If you are using self-hosted runners, you may need to make adjustments to ensure that the necessary software is available.
+This configuration snippet is tested with **hosted** Azure runners. If you are using self-hosted runners, you may need to make adjustments to ensure that the necessary software is available. Consult [Semgrep with self-hosted Ubuntu runners in Azure Pipelines](/docs/kb/semgrep-ci/azure-self-hosted-ubuntu) for two recommended options.
-The following configuration creates a CI job that runs scans depending on what products you have enabled in Semgrep AppSec Platform.
+The following configuration creates a CI job that runs scans using the products and options you have enabled in Semgrep AppSec Platform.
You can **run specific product scans** by passing an argument, such as `--supply-chain`. View the [list of arguments](/getting-started/cli/#scan-using-specific-semgrep-products).
+
+
@@ -475,7 +477,7 @@ You can customize the scan by entering custom rules or other rulesets to scan wi
To run Semgrep CI on any other provider, use the `semgrep/semgrep` image, and run the `semgrep ci` command with `SEMGREP_BASELINE_REF` set for diff-aware scanning.
-**Note**: If you need to use a different image than docker, install Semgrep CI by `pip install semgrep`.
+**Note**: If you need to use a different Docker image or are not running in Docker, install Semgrep CI by `pip install semgrep`.
By setting various [CI environment variables](/semgrep-ci/ci-environment-variables), you can run Semgrep in the following CI providers:
diff --git a/src/components/code_snippets/_azure-semgrep-app-sast.mdx b/src/components/code_snippets/_azure-semgrep-app-sast.mdx
index 05adf35a5..2bfe189bf 100644
--- a/src/components/code_snippets/_azure-semgrep-app-sast.mdx
+++ b/src/components/code_snippets/_azure-semgrep-app-sast.mdx
@@ -18,24 +18,8 @@ steps:
export SEMGREP_PR_ID=$(System.PullRequest.PullRequestId)
export SEMGREP_BASELINE_REF='origin/master'
git fetch origin master:origin/master
- semgrep ci
+ semgrep ci
fi
+ env:
+ SEMGREP_APP_TOKEN: $(SEMGREP_APP_TOKEN)
```
-
-### Setting environment variables in Azure Pipelines
-
-Set these variables within Azure Pipelines UI following the steps in [Environment variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#secret-variables):
-
-* `SEMGREP_APP_TOKEN`
-
-Set these environment variables to troubleshoot the links to the code that generated a finding or if you are not receiving PR or MR comments:
-
-* `SEMGREP_JOB_URL`
-* `SEMGREP_COMMIT`
-* `SEMGREP_BRANCH`
-* `SEMGREP_REPO_URL`
-* `SEMGREP_REPO_NAME`
-
-Set this environment variable for diff-aware scanning:
-
-* `SEMGREP_BASELINE_REF`. Its value is typically your trunkline branch, such as `main` or `master`.
diff --git a/src/components/code_snippets/_azure-semgrep-oss-sast.mdx b/src/components/code_snippets/_azure-semgrep-oss-sast.mdx
index 150ef5c1e..648cc32e7 100644
--- a/src/components/code_snippets/_azure-semgrep-oss-sast.mdx
+++ b/src/components/code_snippets/_azure-semgrep-oss-sast.mdx
@@ -1,6 +1,4 @@
```yaml
-variables:
-- group: Semgrep_Variables
steps:
- checkout: self
diff --git a/src/components/procedure/_set-env-vars-azure.mdx b/src/components/procedure/_set-env-vars-azure.mdx
new file mode 100644
index 000000000..b9d1d3044
--- /dev/null
+++ b/src/components/procedure/_set-env-vars-azure.mdx
@@ -0,0 +1,14 @@
+### Set environment variables in Azure Pipelines
+
+Semgrep minimally requires the variable `SEMGREP_APP_TOKEN` in order to report results to the platform, and other variables may be helpful as well. To set these variables in Azure Pipelines:
+
+1. Set up a [variable group](https://learn.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=classic) called `Semgrep_Variables`.
+2. Set `SEMGREP_APP_TOKEN` in the variable group, following the steps for [secret variables](https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-secret-variables?view=azure-devops&tabs=yaml%2Cbash#set-a-secret-variable-in-a-variable-group). The variable is mapped into the `env` in the provided config.
+3. Optional: Add the following environment variables to the group if you aren't seeing hyperlinks to the code that generated a finding, or if you are not receiving PR or MR comments. Review the use of these variables at [Environment variables for creating hyperlinks in Semgrep AppSec Platform](https://semgrep.dev/docs/semgrep-ci/ci-environment-variables#environment-variables-for-creating-hyperlinks-in-semgrep-appsec-platform).These variables are not sensitive and do not need to be secret variables.
+ * `SEMGREP_REPO_NAME`
+ * `SEMGREP_REPO_URL`
+ * `SEMGREP_BRANCH`
+ * `SEMGREP_COMMIT`
+ * `SEMGREP_JOB_URL`
+4. Set variables for diff-aware scanning. The provided config sets `SEMGREP_PR_ID` to the system variable `System.PullRequest.PullRequestId` and `SEMGREP_BASELINE_REF` to `origin/master` within the `script` section of the config. The value of `SEMGREP_BASELINE_REF` is typically your trunk or default branch, so if you use a different branch than master, update the name accordingly. as `main` or `master`.
+ * If you prefer not to implement diff-aware scanning, you can skip setting these variables and remove the `elif` section of the `script` step.