You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# ModelScan: Protection Against Model Serialization Attacks
13
+
11
14
Machine Learning (ML) models are shared publicly over the internet, within teams and across teams. The rise of Foundation Models have resulted in public ML models being increasingly consumed for further training/fine tuning. ML Models are increasingly used to make critical decisions and power mission-critical applications.
12
15
Despite this, models are not yet scanned with the rigor of a PDF file in your inbox.
13
16
@@ -77,10 +80,10 @@ takes for your computer to process the total filesize from disk(seconds in most
|```modelscan -h```| -h or --help | View usage help |
133
-
|```modelscan -v```| -v or --version | View version information |
137
+
|```modelscan -h```| -h or --help | View usage help |
138
+
|```modelscan -v```| -v or --version | View version information |
134
139
|```modelscan -p /path/to/model_file```| -p or --path | Scan a locally stored model |
135
140
|```modelscan -p /path/to/model_file --settings-file ./modelscan-settings.toml```| --settings-file | Scan a locally stored model using custom configurations |
136
141
|```modelscan create-settings-file```| -l or --location | Create a configurable settings file |
137
142
|```modelscan -r```| -r or --reporting-format | Format of the output. Options are console, json, or custom (to be defined in settings-file). Default is console |
138
143
|```modelscan -r reporting-format -o file-name```| -o or --output-file | Optional file name for output report |
139
144
|```modelscan --show-skipped```| --show-skipped | Print a list of files that were skipped during the scan |
140
145
141
-
142
146
Remember models are just like any other form of digital media, you should scan content from any untrusted source before use.
143
147
144
-
##### CLI Exit Codes
148
+
#### CLI Exit Codes
149
+
145
150
The CLI exit status codes are:
151
+
146
152
-`0`: Scan completed successfully, no vulnerabilities found
147
153
-`1`: Scan completed successfully, vulnerabilities found
148
154
-`2`: Scan failed, modelscan threw an error while scanning
@@ -201,7 +207,7 @@ Licensed under the Apache License, Version 2.0 (the "License");
201
207
you may not use this file except in compliance with the License.
202
208
You may obtain a copy of the License at
203
209
204
-
http://www.apache.org/licenses/LICENSE-2.0
210
+
<http://www.apache.org/licenses/LICENSE-2.0>
205
211
206
212
Unless required by applicable law or agreed to in writing, software
207
213
distributed under the License is distributed on an "AS IS" BASIS,
Copy file name to clipboardExpand all lines: docs/model_serialization_attacks.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ Machine Learning(ML) models are the foundational asset in ML powered application
5
5
Models can be compromised in various ways, some are new like adversarial machine learning methods, others are common with traditional applications like denial of service attacks. While these can be a threat to safely operating an ML powered application, this document focuses on exposing the risk of Model Serialization Attacks.
6
6
In a Model Serialization Attack malicious code is added to a model when it is saved, this is also called a code injection attack as well. When any user or system then loads the model for further training or inference the attack code is executed immediately, often with no visible change in behavior to users. This makes the attack a powerful vector and an easy point of entry for attacking broader machine learning components.
7
7
8
-
To secure ML models, you need to understand what’s inside them and how they are stored on disk in a process called serialization.
8
+
To secure ML models, you need to understand what’s inside them and how they are stored on disk in a process called serialization.
9
9
10
10
ML models are composed of:
11
11
@@ -30,7 +30,7 @@ Before digging into how a Model Serialization Attack works and how to scan for t
30
30
31
31
## 1. Pickle Variants
32
32
33
-
**Pickle** and its variants (cloudpickle, dill, joblib) all store objects to disk in a general purpose way. These frameworks are completely ML agnostic and store Python objects as-is.
33
+
**Pickle** and its variants (cloudpickle, dill, joblib) all store objects to disk in a general purpose way. These frameworks are completely ML agnostic and store Python objects as-is.
34
34
35
35
Pickle is the defacto library for serializing ML models for following ML frameworks:
36
36
@@ -47,15 +47,15 @@ Pickle is also used to store vectors/tensors only for following frameworks:
47
47
Pickle allows for arbitrary code execution and is highly vulnerable to code injection attacks with very large attack surface. Pickle documentation makes it clear with the following warning:
48
48
49
49
> **Warning:** The `pickle` module **is not secure**. Only unpickle data you trust.
50
-
>
51
-
>
50
+
>
51
+
>
52
52
> It is possible to construct malicious pickle data which will **execute
53
53
> arbitrary code during unpickling**. Never unpickle data that could have come
54
54
> from an untrusted source, or that could have been tampered with.
55
-
>
55
+
>
56
56
> Consider signing data with [hmac](https://docs.python.org/3/library/hmac.html#module-hmac) if you need to ensure that it has not
57
57
> been tampered with.
58
-
>
58
+
>
59
59
> Safer serialization formats such as [json](https://docs.python.org/3/library/json.html#module-json) may be more appropriate if
60
60
> you are processing untrusted data.
61
61
@@ -129,6 +129,7 @@ With the exception of pickle, these formats cannot execute arbitrary code. Howev
129
129
With an understanding of various approaches to model serialization, explore how many popular choices are vulnerable to this attack with an end to end explanation.
130
130
131
131
# End to end Attack Scenario
132
+
132
133
1. Internal attacker:
133
134
The attack complexity will vary depending on the access trusted to an internal actor.
-**HIGH:** A model file that consists of unsafe operators/globals that can not execute code but can still be exploited is classified at high severity. These operators are:
-**MEDIUM:** A model file that consists of operators/globals that are neither supported by the parent ML library nor are known to modelscan are classified at medium severity.
12
-
- Keras Lambda layer can also be used for arbitrary code execution. In general, it is not a best practise to add a Lambda layer to a ML model that can get exploited for code injection attacks.
13
-
- Work in Progress: Custom operators will be classified at medium severity.
12
+
-**MEDIUM:** A model file that consists of operators/globals that are neither supported by the parent ML library nor are known to modelscan are classified at medium severity.
13
+
- Keras Lambda layer can also be used for arbitrary code execution. In general, it is not a best practise to add a Lambda layer to a ML model that can get exploited for code injection attacks.
14
+
- Work in Progress: Custom operators will be classified at medium severity.
14
15
<br> </br>
15
-
-**LOW:** At the moment no operators/globals are classified at low severity level.
16
+
-**LOW:** At the moment no operators/globals are classified at low severity level.
0 commit comments