Skip to content

Commit

Permalink
docs: add a tutorial for finding malicious artifact uploads (#474)
Browse files Browse the repository at this point in the history
Signed-off-by: behnazh-w <[email protected]>
  • Loading branch information
behnazh-w authored Sep 14, 2023
1 parent 1f5ed10 commit 5e9dced
Show file tree
Hide file tree
Showing 8 changed files with 288 additions and 8 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 6 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

.. References/links
.. _Witness: https://github.com/testifysec/witness
.. _SLSA: https://slsa.dev

=====================
Macaron documentation
Expand Down Expand Up @@ -40,6 +41,8 @@ To start with Macaron, see the :doc:`Installation </pages/installation>` and :do

For all services and technologies that Macaron supports, see the :doc:`Supported Technologies </pages/supported_technologies/index>` page.

.. _checks:

-------------------------
Current checks in Macaron
-------------------------
Expand All @@ -61,7 +64,7 @@ the requirements that are currently supported by Macaron.
- Identify and validate build script(s).
* - 1
- **Provenance available** - Provenances are available.
- Check for existence of provenances, which can be SLSA or `Witness`_ provenances. If there is no provenance, the repo can still be compliant to level 1 given the build script is available.
- Check for existence of provenances, which can be `SLSA`_ or `Witness`_ provenances. If there is no provenance, the repo can still be compliant to level 1 given the build script is available.
* - 1
- **Witness provenance** - One or more `Witness`_ provenances are discovered.
- Check for existence of `Witness`_ provenances, and whether artifact digests match those in the provenances.
Expand Down Expand Up @@ -106,7 +109,8 @@ intermediate representations as abstractions. Using such abstractions, Macaron i

pages/installation
pages/using
pages/output_files
pages/cli_usage/index
pages/tutorials/index
pages/output_files
pages/supported_technologies/index
pages/developers_guide/index
276 changes: 276 additions & 0 deletions docs/source/pages/tutorials/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,276 @@
.. Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
.. References/links
.. _Witness: https://github.com/testifysec/witness
.. _SLSA: https://slsa.dev


.. _tutorials:

=========
Tutorials
=========

On this page, you will find tutorials to get you started with Macaron. The tutorials show Macaron
in action, analyzing a software component and its dependencies that are built using GitHub Actions
or GitLab. Macaron supports artifacts published on GitHub release, `Maven Central <https://central.sonatype.com>`_,
or privately hosted registries, such as `JFrog <https://jfrog.com/>`_.

---------------------------------------------------------------------
Detect a malicious Java dependency uploaded manually to Maven Central
---------------------------------------------------------------------

In this tutorial we show how Macaron can determine whether the dependencies of a Java project are built
and published via transparent CI workflows or manually uploaded to Maven Central. When an artifact is
manually uploaded, the artifact distributor can modify the artifact and potentially include malicious
code without being detected.

The example project we analyze in this tutorial is `example-maven-app <https://github.com/behnazh-w/example-maven-app>`_,
which is hosted on GitHub. This example application uses Maven to build and manage dependencies, and has two
dependencies:


.. list-table::
:widths: 25 50
:header-rows: 1

* - Artifact name
- `Package URL (PURL) <https://github.com/package-url/purl-spec>`_
* - `guava <https://central.sonatype.com/artifact/com.google.guava/guava>`_
- ``pkg:maven/com.google.guava/[email protected]?type=jar``
* - `jackson-databind <https://central.sonatype.com/artifact/io.github.behnazh-w.demo/jackson-databind>`_
- ``pkg:maven/io.github.behnazh-w.demo/[email protected]?type=jar``

While the ``guava`` dependency follows best practices to publish artifacts automatically with minimal human
intervention, ``jackson-databind`` is a malicious dependency that pretends to provide data-binding functionalities
like `the official jackson-databind <https://github.com/FasterXML/jackson-databind>`_ library (note that
this artifact is created for demonstration purposes and is not actually malicious).

Now let's see how Macaron can help us with evaluating the supply chain security posture of
``example-maven-app`` and its dependencies.

************
Installation
************

Please follow the instructions :ref:`here <installation-guide>`. In summary, you need:

* Docker
* the ``run_macaron.sh`` script to run the Macaron image.

.. note:: At the moment, Docker alternatives (e.g. podman) are not supported.

*************
Prerequisites
*************

You need to provide Macaron with a GitHub token through the ``GITHUB_TOKEN`` environment variable.

To obtain a GitHub Token:

* Go to ``GitHub settings`` → ``Developer Settings`` (at the bottom of the left side pane) → ``Personal Access Tokens`` → ``Fine-grained personal access tokens`` → ``Generate new token``. Give your token a name and an expiry period.
* Under ``"Repository access"``, choosing ``"Public Repositories (read-only)"`` should be good enough in most cases.

Now you should be good to run Macaron. For more details, see the documentation :ref:`here <prepare-github-token>`.


***********************
Run ``analyze`` command
***********************

First, we need to run the ``analyze`` command of Macaron to run a number of :ref:`checks <checks>` and collect evidence for ``example-maven-app`` and its dependencies.

.. code-block:: shell
./run_macaron.sh analyze -rp https://github.com/behnazh-w/example-maven-app
.. note:: By default, Macaron clones the repositories and creates output files under the ``output`` directory. To understand the structure of this directory please see :ref:`Output Files Guide <output_files_guide>`.

By default, this command analyzes the the latest commit of the default branch of the repository. You can also analyze the repository
at a specific commit by providing the branch and commit digest. See the :ref:`CLI options<analyze-action-cli>` of the ``analyze`` command for more information.
After running the ``analyze`` command, we can view the data that Macaron has gathered about the ``example-maven-app`` repository in an HTML report.

.. code-block:: shell
open output/reports/github_com/behnazh-w/example-maven-app/example-maven-app.html
.. _fig_example-maven-app:

.. figure:: ../../_static/images/tutorial_example_maven_app_report.png
:alt: HTML report for ``example-maven-app``
:align: center

|
The image above shows the results of the checks for `example-maven-app <https://github.com/behnazh-w/example-maven-app>`_ repository itself.
As you can see, some of the checks are passing and some are failing. In summary, this project

* is not producing any `SLSA`_ or `Witness`_ provenances (``mcn_provenance_available_1``)
* is using GitHub Actions to build and test using ``mvnw`` (``mcn_build_service_1``)
* but it is not deploying any artifacts automatically (``mcn_build_as_code_1``)
* and no CI workflow runs are detected that automatically publish artifacts (``mcn_infer_artifact_pipeline_1``)

As you scroll down in the HTML report, you will see a section for the dependencies that were automatically identified:

.. _fig_example-maven-app-deps:

.. figure:: ../../_static/images/tutorial_example_maven_app_report_dependencies.png
:alt: HTML report for dependencies of ``example-maven-app``
:align: center

|
| Macaron has found the two dependencies as expected:
* ``io.github.behnazh-w.demo:jackson-databind:1.0``
* ``com.google.guava:guava:32.1.2-jre``

When we open the reports for each dependency, we see that ``mcn_infer_artifact_pipeline_1`` is passed for ``com.google.guava:guava:32.1.2-jre``
and a GitHub Actions workflow run is found for publishing version ``32.1.2-jre``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``.
This means that ``io.github.behnazh-w.demo:jackson-databind:1.0`` could have been built and published manually to Maven Central
and could potentially be malicious.

.. _fig_infer_artifact_pipeline_guava:

.. figure:: ../../_static/images/tutorial_guava_infer_pipeline.png
:alt: mcn_infer_artifact_pipeline_1 for com.google.guava:guava:32.1.2-jre
:align: center

``com.google.guava:guava:32.1.2-jre``

.. _fig_infer_artifact_pipeline_bh_jackson_databind:

.. figure:: ../../_static/images/tutorial_bh_jackson_databind_infer_pipeline.png
:alt: mcn_infer_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0
:align: center

``io.github.behnazh-w.demo:jackson-databind:1.0``

|
After running the ``analyze`` command, all the check results are stored in ``output/macaron.db``.
Next, we show how to use the policy engine to detect if the dependencies of ``example-maven-app``
are not published from a publicly available CI workflow run.

*****************************
Run ``verify-policy`` command
*****************************

While the ``analyze`` command shown in the previous section collects information,
it does not automatically confirm whether a repository satisfies **your** security requirements.
This is where the ``verify-policy`` command comes in. With Macaron, you can use `Soufflé Datalog <https://souffle-lang.github.io/index.html>`_
in order to express the security requirements and let Macaron automatically validate it against the collected data.
Datalog is very similar to SQL and allows writing declarative queries for the
results collected by the ``analyze`` command. We use such queries as policy rules as described next.

The security requirement in this tutorial is to mandate dependencies of our project to have a
transparent artifact publish CI workflows. To write a policy for this requirement, first we need to
revisit the checks shown in the HTML report in the previous :ref:`step <fig_example-maven-app>`.
The result of each of the checks can be queried by the check ID in the first column. For the policy in this tutorial,
we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks:

.. code-block:: c++

#include "prelude.dl"

Policy("detect-malicious-upload", component_id, "") :-
is_component(component_id, _),
!violating_dependencies(component_id).

.decl violating_dependencies(parent: number)
violating_dependencies(parent) :-
transitive_dependency(parent, dependency),
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
!check_passed(dependency, "mcn_provenance_level_three_1").

apply_policy_to("detect-malicious-upload", component_id) :-
is_repo(_, "github.com/behnazh-w/example-maven-app", component_id).


This policy requires that all the dependencies
of repository ``github.com/behnazh-w/example-maven-app`` either pass the ``mcn_provenance_level_three_1`` (have non-forgeable
`SLSA`_ provenances) or ``mcn_infer_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced
by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_infer_artifact_pipeline_1`` needs to pass
only if ``mcn_provenance_level_three_1`` fails.

Let's take a closer look at this policy to understand what each line means.

.. code-block:: c++

#include "prelude.dl"

This line imports the predefined Datalog relations into your Datalog specification. These relations
can be thought of as select statements specifically provided by Macaron to make it easier for you
to write policies. In our example policy, the following relations are pre-defined:

* ``Policy(policy_id: symbol, target_id: number, message: symbol)``
* ``is_component(component_id: number, purl: symbol)``
* ``transitive_dependency(parent: number, dependency: number)``
* ``check_passed(component_id: number, check_name: symbol)``
* ``apply_policy_to(policy_id: symbol, component_id: number)``
* ``is_repo(repo_id: number, repo_complete_name: symbol, component_id: number)``

And the following relation is declared in this policy:

* ``violating_dependencies(parent: number)``

Feel free to browse through the available
relations `here <https://github.com/oracle/macaron/blob/main/src/macaron/policy_engine/prelude/>`_
to see how they are constructed before moving on.

.. code-block:: c++

Policy("detect-malicious-upload", component_id, "") :-
is_component(component_id, _),
!violating_dependencies(component_id).

This rule populates the ``Policy`` relation if ``component_id`` exists in the database and
``violating_dependencies`` relation for this component is empty.

.. code-block:: c++

.decl violating_dependencies(parent: number)
violating_dependencies(parent) :-
transitive_dependency(parent, dependency),
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
!check_passed(dependency, "mcn_provenance_level_three_1").

This is the rule that the user needs to design to detect dependencies that violate a security requirement.
Here we declare a relation called ``violating_dependencies`` and populate it if the dependencies in the
``transitive_dependency`` relation do not pass any of the ``mcn_infer_artifact_pipeline_1`` and
``mcn_provenance_level_three_1`` checks.

.. code-block:: c++

apply_policy_to("detect-malicious-upload", component_id) :-
is_repo(_, "github.com/behnazh-w/example-maven-app", component_id).

Finally, the ``apply_policy_to`` rule applies the policy ``detect-malicious-upload`` on the
repository ``github.com/behnazh-w/example-maven-app``. Note that each run of Macaron analyzes a repository at a specific
commit. So, the database can include more than one result for a repository and this policy will be
validated on all commits available in the database.

Let's name this policy ``example-maven-app.dl``. To verify this policy run:

.. code-block:: shell
./run_macaron.sh verify-policy --database ./output/macaron.db --file ./example-maven-app.dl
You can see the policy result both in the console and ``output/policy_report.json``. The results
printed to the console will look like the following:

.. code-block:: javascript
passed_policies
component_satisfies_policy
failed_policies
['detect-malicious-upload']
component_violates_policy
['1', 'pkg:github.com/behnazh-w/example-maven-app@34c06e8ae3811885c57f8bd42db61f37ac57eb6c', 'detect-malicious-upload']
As you can see, the policy has failed because the ``io.github.behnazh-w.demo:jackson-databind:1.0``
dependency is manually uploaded to Maven Central and does not meet the security requirement.

You can use this policy in your GitHub Actions to prevent a deployment or fail a CI test during the
development. Alternatively, you can treat the result as a warning and manually investigate the
dependencies to make sure they are secure and can be trusted.
10 changes: 5 additions & 5 deletions src/macaron/policy_engine/prelude/helper_rules.dl
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@ not_self_hosted_git(repo_id, message) :- repository(repo_id, _, _, _, _, _, _, r
match("^.*(github.com|gitlab.com).*$", remote), message=remote.

/**
* This fact exists iff a repository with id dependency is a dependency of repository with id repo.
* This fact exists iff a component with id dependency is a dependency of component with id parent.
*/
.decl transitive_dependency(repo_id: number, dependency: number)
transitive_dependency(repo_id, dependency) :- dependency(repo_id, dependency).
transitive_dependency(repo_id, dependency) :-
transitive_dependency(repo_id, a), transitive_dependency(a, dependency).
.decl transitive_dependency(parent: number, dependency: number)
transitive_dependency(parent, dependency) :- dependency(parent, dependency).
transitive_dependency(parent, dependency) :-
transitive_dependency(parent, a), transitive_dependency(a, dependency).

/**
* Extract the id and PURL from the component relation.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ def run_check(self, ctx: AnalyzeContext, check_result: CheckResult) -> CheckResu
]
return CheckResultType.PASSED

check_result["justification"].append("Could not find any SLSA provenances.")
check_result["justification"].append("Could not find any SLSA or Witness provenances.")
return CheckResultType.FAILED


Expand Down

0 comments on commit 5e9dced

Please sign in to comment.