Skip to content

Commit 2226bc4

Browse files
CandiedCodeswashko
andauthored
build: ✨ update local setup (#140)
Co-authored-by: Sam Washko <[email protected]>
1 parent 341582a commit 2226bc4

File tree

6 files changed

+167
-36
lines changed

6 files changed

+167
-36
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,4 +132,7 @@ cython_debug/
132132

133133
# Notebook Model Downloads
134134
notebooks/PyTorchModels/
135-
pytorch-model-scan-results.json
135+
pytorch-model-scan-results.json
136+
137+
# Code Coverage
138+
cov.xml

Makefile

Lines changed: 48 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,77 @@
1+
.DEFAULT_GOAL := help
12
VERSION ?= $(shell dunamai from git --style pep440 --format "{base}.dev{distance}+{commit}")
23

3-
install-dev:
4+
.PHONY: env
5+
env: ## Display information about the current environment.
6+
poetry env info
7+
8+
.PHONY: install-dev
9+
install-dev: ## Install all dependencies including dev and test dependencies, as well as pre-commit.
410
poetry install --with dev --with test --extras "tensorflow h5py"
511
pre-commit install
612

7-
install:
13+
.PHONY: install
14+
install: ## Install required dependencies.
815
poetry install
916

10-
install-prod:
17+
.PHONY: install-prod
18+
install-prod: ## Install prod dependencies.
1119
poetry install --with prod
1220

13-
install-test:
21+
.PHONY: install-test
22+
install-test: ## Install test dependencies.
1423
poetry install --with test --extras "tensorflow h5py"
1524

16-
clean:
17-
pip uninstall modelscan
25+
.PHONY: clean
26+
clean: ## Uninstall modelscan
27+
python -m pip uninstall modelscan
28+
29+
.PHONY: test
30+
test: ## Run pytests.
31+
poetry run pytest tests/
1832

19-
test:
20-
poetry run pytest
33+
.PHONY: test-cov
34+
test-cov: ## Run pytests with code coverage.
35+
poetry run pytest --cov=modelscan --cov-report xml:cov.xml tests/
2136

22-
build:
37+
.PHONY: build
38+
build: ## Build the source and wheel achive.
2339
poetry build
2440

41+
.PHONY: build-prod
2542
build-prod: version
43+
build-prod: ## Update the version and build wheel archive.
2644
poetry build
2745

28-
version:
46+
.PHONY: version
47+
version: ## Bumps the version of the project.
2948
echo "__version__ = '$(VERSION)'" > modelscan/_version.py
3049
poetry version $(VERSION)
3150

51+
.PHONY: lint
3252
lint: bandit mypy
53+
lint: ## Run all the linters.
3354

34-
bandit:
55+
.PHONY: bandit
56+
bandit: ## Run SAST scanning.
3557
poetry run bandit -c pyproject.toml -r .
3658

37-
mypy:
59+
.PHONY: mypy
60+
mypy: ## Run type checking.
3861
poetry run mypy --ignore-missing-imports --strict --check-untyped-defs .
3962

40-
format:
63+
.PHONY: black
64+
format: ## Run black to format the code.
4165
black .
4266

67+
68+
.PHONY: help
69+
help: ## List all targets and help information.
70+
@grep --no-filename -E '^([a-z.A-Z_%-/]+:.*?)##' $(MAKEFILE_LIST) | sort | \
71+
awk 'BEGIN {FS = ":.*?(## ?)"}; { \
72+
if (length($$1) > 0) { \
73+
printf " \033[36m%-30s\033[0m %s\n", $$1, $$2; \
74+
} else { \
75+
printf "%s\n", $$2; \
76+
} \
77+
}'

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@
77
[![Supported Versions](https://img.shields.io/pypi/pyversions/modelscan.svg)](https://pypi.org/project/modelscan)
88
[![pypi Version](https://img.shields.io/pypi/v/modelscan)](https://pypi.org/project/modelscan)
99
[![License: Apache 2.0](https://img.shields.io/crates/l/apa)](https://opensource.org/license/apache-2-0/)
10+
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
11+
1012
# ModelScan: Protection Against Model Serialization Attacks
13+
1114
Machine Learning (ML) models are shared publicly over the internet, within teams and across teams. The rise of Foundation Models have resulted in public ML models being increasingly consumed for further training/fine tuning. ML Models are increasingly used to make critical decisions and power mission-critical applications.
1215
Despite this, models are not yet scanned with the rigor of a PDF file in your inbox.
1316

@@ -77,10 +80,10 @@ takes for your computer to process the total filesize from disk(seconds in most
7780

7881
ModelScan ranks the unsafe code as:
7982

80-
* CRITICAL
81-
* HIGH
82-
* MEDIUM
83-
* LOW
83+
- CRITICAL
84+
- HIGH
85+
- MEDIUM
86+
- LOW
8487

8588
![ModelScan Flow Chart](/imgs/model_scan_flow_chart.png)
8689

@@ -104,6 +107,7 @@ At present, ModelScan supports any Pickle derived format and many others:
104107
| Classic ML Libraries (Sklearn, XGBoost etc.) | pickle.dump(), dill.dump(), joblib.dump(), cloudpickle.dump() | Pickle, Cloudpickle, Dill, Joblib | Yes |
105108

106109
### Installation
110+
107111
ModelScan is installed on your systems as a Python package(Python 3.9 to 3.12 supported). As shown from above you can install
108112
it by running this in your terminal:
109113

@@ -119,6 +123,7 @@ modelscan = ">=0.1.1"
119123
```
120124

121125
Scanners for Tensorflow or HD5 formatted models require installation with extras:
126+
122127
```bash
123128
pip install 'modelscan[ tensorflow, h5py ]'
124129
```
@@ -129,20 +134,21 @@ ModelScan supports the following arguments via the CLI:
129134

130135
| Usage | Argument | Explanation |
131136
|----------------------------------------------------------------------------------|------------------|---------------------------------------------------------|
132-
| ```modelscan -h ``` | -h or --help | View usage help |
133-
| ```modelscan -v ``` | -v or --version | View version information |
137+
| ```modelscan -h``` | -h or --help | View usage help |
138+
| ```modelscan -v``` | -v or --version | View version information |
134139
| ```modelscan -p /path/to/model_file``` | -p or --path | Scan a locally stored model |
135140
| ```modelscan -p /path/to/model_file --settings-file ./modelscan-settings.toml``` | --settings-file | Scan a locally stored model using custom configurations |
136141
| ```modelscan create-settings-file``` | -l or --location | Create a configurable settings file |
137142
| ```modelscan -r``` | -r or --reporting-format | Format of the output. Options are console, json, or custom (to be defined in settings-file). Default is console |
138143
| ```modelscan -r reporting-format -o file-name``` | -o or --output-file | Optional file name for output report |
139144
| ```modelscan --show-skipped``` | --show-skipped | Print a list of files that were skipped during the scan |
140145

141-
142146
Remember models are just like any other form of digital media, you should scan content from any untrusted source before use.
143147

144-
##### CLI Exit Codes
148+
#### CLI Exit Codes
149+
145150
The CLI exit status codes are:
151+
146152
- `0`: Scan completed successfully, no vulnerabilities found
147153
- `1`: Scan completed successfully, vulnerabilities found
148154
- `2`: Scan failed, modelscan threw an error while scanning
@@ -201,7 +207,7 @@ Licensed under the Apache License, Version 2.0 (the "License");
201207
you may not use this file except in compliance with the License.
202208
You may obtain a copy of the License at
203209

204-
http://www.apache.org/licenses/LICENSE-2.0
210+
<http://www.apache.org/licenses/LICENSE-2.0>
205211

206212
Unless required by applicable law or agreed to in writing, software
207213
distributed under the License is distributed on an "AS IS" BASIS,

docs/model_serialization_attacks.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Machine Learning(ML) models are the foundational asset in ML powered application
55
Models can be compromised in various ways, some are new like adversarial machine learning methods, others are common with traditional applications like denial of service attacks. While these can be a threat to safely operating an ML powered application, this document focuses on exposing the risk of Model Serialization Attacks.
66
In a Model Serialization Attack malicious code is added to a model when it is saved, this is also called a code injection attack as well. When any user or system then loads the model for further training or inference the attack code is executed immediately, often with no visible change in behavior to users. This makes the attack a powerful vector and an easy point of entry for attacking broader machine learning components.
77

8-
To secure ML models, you need to understand what’s inside them and how they are stored on disk in a process called serialization.
8+
To secure ML models, you need to understand what’s inside them and how they are stored on disk in a process called serialization.
99

1010
ML models are composed of:
1111

@@ -30,7 +30,7 @@ Before digging into how a Model Serialization Attack works and how to scan for t
3030

3131
## 1. Pickle Variants
3232

33-
**Pickle** and its variants (cloudpickle, dill, joblib) all store objects to disk in a general purpose way. These frameworks are completely ML agnostic and store Python objects as-is.
33+
**Pickle** and its variants (cloudpickle, dill, joblib) all store objects to disk in a general purpose way. These frameworks are completely ML agnostic and store Python objects as-is.
3434

3535
Pickle is the defacto library for serializing ML models for following ML frameworks:
3636

@@ -47,15 +47,15 @@ Pickle is also used to store vectors/tensors only for following frameworks:
4747
Pickle allows for arbitrary code execution and is highly vulnerable to code injection attacks with very large attack surface. Pickle documentation makes it clear with the following warning:
4848

4949
> **Warning:** The `pickle` module **is not secure**. Only unpickle data you trust.
50-
>
51-
>
50+
>
51+
>
5252
> It is possible to construct malicious pickle data which will **execute
5353
> arbitrary code during unpickling**. Never unpickle data that could have come
5454
> from an untrusted source, or that could have been tampered with.
55-
>
55+
>
5656
> Consider signing data with [hmac](https://docs.python.org/3/library/hmac.html#module-hmac) if you need to ensure that it has not
5757
> been tampered with.
58-
>
58+
>
5959
> Safer serialization formats such as [json](https://docs.python.org/3/library/json.html#module-json) may be more appropriate if
6060
> you are processing untrusted data.
6161
@@ -129,6 +129,7 @@ With the exception of pickle, these formats cannot execute arbitrary code. Howev
129129
With an understanding of various approaches to model serialization, explore how many popular choices are vulnerable to this attack with an end to end explanation.
130130

131131
# End to end Attack Scenario
132+
132133
1. Internal attacker:
133134
The attack complexity will vary depending on the access trusted to an internal actor.
134135
2. External attacker:

docs/severity_levels.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
# modelscan Severity Levels
22

3-
modelscan classifies potentially malicious code injection attacks in the following four severity levels.
3+
modelscan classifies potentially malicious code injection attacks in the following four severity levels.
44
<br> </br>
5+
56
- **CRITICAL:** A model file that consists of unsafe operators/globals that can execute code is classified at critical severity. These operators are:
6-
- exec, eval, runpy, sys, open, breakpoint, os, subprocess, socket, nt, posix
7+
- exec, eval, runpy, sys, open, breakpoint, os, subprocess, socket, nt, posix
78
<br> </br>
89
- **HIGH:** A model file that consists of unsafe operators/globals that can not execute code but can still be exploited is classified at high severity. These operators are:
9-
- webbrowser, httplib, request.api, Tensorflow ReadFile, Tensorflow WriteFile
10+
- webbrowser, httplib, request.api, Tensorflow ReadFile, Tensorflow WriteFile
1011
<br> </br>
11-
- **MEDIUM:** A model file that consists of operators/globals that are neither supported by the parent ML library nor are known to modelscan are classified at medium severity.
12-
- Keras Lambda layer can also be used for arbitrary code execution. In general, it is not a best practise to add a Lambda layer to a ML model that can get exploited for code injection attacks.
13-
- Work in Progress: Custom operators will be classified at medium severity.
12+
- **MEDIUM:** A model file that consists of operators/globals that are neither supported by the parent ML library nor are known to modelscan are classified at medium severity.
13+
- Keras Lambda layer can also be used for arbitrary code execution. In general, it is not a best practise to add a Lambda layer to a ML model that can get exploited for code injection attacks.
14+
- Work in Progress: Custom operators will be classified at medium severity.
1415
<br> </br>
15-
- **LOW:** At the moment no operators/globals are classified at low severity level.
16+
- **LOW:** At the moment no operators/globals are classified at low severity level.

0 commit comments

Comments
 (0)