You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PolyFile has an interactive debugger both for its file matching and parsing. It can be used to debug a libmagic pattern
93
+
definition, determine why a specific file fails to be classified as the expected MIME type, or step through a parser.
94
+
You can run PolyFile with the debugger enabled using the `-db` option.
95
+
79
96
### File Support
80
97
81
98
PolyFile has a cleanroom, [pure Python implementation of the libmagic file classifier](#libmagic-implementation), and
@@ -102,6 +119,12 @@ TrID matching code is still shipped with PolyFile and can be invoked programmati
102
119
103
120
PolyFile outputs its mapping in an extension of the [SBuD](https://github.com/corkami/sbud) JSON format described [in the documentation](docs/json_format.md).
104
121
122
+
PolyFile can also emit a standalone HTML document that contains an interactive hex viewer as well as syntax trees for
123
+
the discovered file formats. Simply pass the `--html` argument to PolyFile with an output path:
124
+
```console
125
+
$ polyfile input_file --html output.html
126
+
```
127
+
105
128
### libMagic Implementation
106
129
107
130
PolyFile has a cleanroom implementation of [libmagic (used in the `file` command)](https://github.com/file/file).
@@ -125,6 +148,32 @@ with open("file_to_test", "rb") as f:
125
148
...
126
149
```
127
150
151
+
### Debugging the libmagic DSL
152
+
`libmagic` has an esoteric, poorly documented doman-specific language (DSL) for specifying its matching signatures.
153
+
You can read the minimal and—as we have discovered in our cleanroom implementation—_incomplete_ documentation by running
154
+
`man 5 magic`. PolyFile implements an interactive debugger for stepping through the DSL specifications, modeled after
155
+
GDB. You can enter this debugger by passing the `--debugger` or `-db` argument to PolyFile. It is useful for both
156
+
implementing new `libmagic` DSLs, as well as figuring out why an existing DSL fails to match against a given file.
A utility to recursively map the structure of a file.
197
+
A utility to merge the JSON output of `polyfile`
198
+
with a polytracker.json file from PolyTracker.
199
+
200
+
https://github.com/trailofbits/polyfile/
201
+
https://github.com/trailofbits/polytracker/
148
202
149
203
positional arguments:
150
-
FILE the file to analyze; pass '-' or omit to read from
151
-
STDIN
204
+
FILES Path to the PolyFile JSON output and/or the PolyTracker JSON output. Merging will only occur if both files are provided. The `--cfg` and `--type-hierarchy` options can be used if only a single file is provided, but no merging will occur.
152
205
153
206
optional arguments:
154
207
-h, --help show this help message and exit
155
-
--filetype FILETYPE, -f FILETYPE
156
-
explicitly match against the given filetype or
157
-
filetype wildcard (default is to match against all
158
-
filetypes)
159
-
--list, -l list the supported filetypes (for the `--filetype`
160
-
argument) and exit
161
-
--html HTML, -t HTML path to write an interactive HTML file for exploring
162
-
the PDF
163
-
--only-match-mime, -I
164
-
just print out the matching MIME types for the file,
165
-
one on each line
166
-
--only-match, -m do not attempt to parse known filetypes; only match
167
-
against file magic
168
-
--require-match if no matches are found, exit with code 127
169
-
--max-matches MAX_MATCHES
170
-
stop scanning after having found this many matches
171
-
--debug, -d print debug information
172
-
--trace, -dd print extra verbose debug information
173
-
--quiet, -q suppress all log output (overrides --debug)
174
-
--version, -v print PolyFile's version information to STDERR
175
-
-dumpversion print PolyFile's raw version information to STDOUT and
176
-
exit
208
+
--cfg CFG, -c CFG Optional path to output a Graphviz .dot file representing the control flow graph of the program trace
209
+
--cfg-pdf CFG_PDF, -p CFG_PDF
210
+
Similar to --cfg, but renders the graph to a PDF instead of outputting the .dot source
211
+
--dataflow [DATAFLOW ...]
212
+
For the CFG generation options, only render functions that participated in dataflow. `--dataflow 10` means that only functions in the dataflow related to byte 10 should be included. `--dataflow 10:30` means that only functions operating on bytes 10 through 29 should be included. The beginning or end of a range can be omitted and will default to the beginning and end of the file, respectively. Multiple `--dataflow` ranges can be specified. `--dataflow :` will render the CFG only with functions that operated on tainted bytes. Omitting `--dataflow` will produce a CFG containing all functions.
213
+
--no-intermediate-functions
214
+
To be used in conjunction with `--dataflow`. If enabled, only functions in the dataflow graph if they operated on the tainted bytes. This can result in a disjoint dataflow graph.
215
+
--demangle Demangle C++ function names in the CFG (requires that PolyFile was installed with the `demangle` option, or that the `cxxfilt` Python module is installed.)
Similar to --type-hierarchy, but renders the graph to a PDF instead of outputting the .dot source
220
+
--diff [DIFF ...] Diff an arbitrary number of input polytracker.json files, all treated as the same class, against one or more polytracker.json provided after `--diff` arguments
221
+
--debug, -d Print debug information
222
+
--quiet, -q Suppress all log output (overrides --debug)
223
+
--version, -v Print PolyMerge's version information and exit
224
+
-dumpversion Print PolyMerge's raw version information and exit
177
225
```
178
226
179
227
The output of `polymerge` is the same as [PolyFile’s output format](docs/json_format.md), augmented with the following:
@@ -202,5 +250,4 @@ This research was developed by [Trail of
202
250
Bits](https://www.trailofbits.com/) with funding from the Defense
203
251
Advanced Research Projects Agency (DARPA) under the SafeDocs program
204
252
as a subcontractor to [Galois](https://galois.com). It is licensed under the [Apache 2.0 license](LICENSE).
205
-
The [PDF parser](polyfile/pdfparser.py) is modified from the parser developed by Didier Stevens and released into the
0 commit comments