Skip to content

Commit ec0216f

Browse files
authored
Merge pull request #3384 from trailofbits/release-0-5-0
Release v0.5.0
2 parents 8ff4e62 + 625889f commit ec0216f

File tree

3 files changed

+68
-64
lines changed

3 files changed

+68
-64
lines changed

README.md

Lines changed: 54 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ pip3 install polyfile
2525

2626
To install PolyFile from source, in the same directory as this README, run:
2727
```
28-
pip3 install -e .
28+
pip3 install .
2929
```
3030

3131
Important: Before installing from source, make sure Java is installed. Java is used to
@@ -35,11 +35,33 @@ This will automatically install the `polyfile` and `polymerge` executables in yo
3535

3636
## Usage
3737

38+
Running `polyfile` on a file with no arguments will mimic the behavior of `file --keep-going`:
39+
```console
40+
$ polyfile png-polyglot.png
41+
PNG image data, 256 x 144, 8-bit/color RGB, non-interlaced
42+
Brainfu** Program
43+
Malformed PDF
44+
PDF document, version 1.3, 1 pages
45+
ZIP end of central directory record Java JAR archive
46+
```
47+
To generate an interactive hex viewer for the file, use the `--html` option:
48+
```console
49+
$ polyfile --html output.html png-polyglot.png
50+
Found a file of type application/pdf at byte offset 0
51+
Found a file of type application/x-brainfuck at byte offset 0
52+
Found a file of type image/png at byte offset 0
53+
Found a file of type application/zip at byte offset 0
54+
Found a file of type application/java-archive at byte offset 0
55+
Saved HTML output to output.html
3856
```
39-
usage: polyfile [-h] [--format {mime,html,json,sbud}] [--output OUTPUT]
40-
[--filetype FILETYPE] [--list] [--html HTML]
57+
58+
Full usage instructions follow:
59+
```
60+
usage: polyfile [-h] [--format {file,mime,html,json,sbud}] [--output OUTPUT]
61+
[--filetype FILETYPE] [--list] [--html HTML] [--explain]
4162
[--only-match-mime] [--only-match] [--require-match]
42-
[--max-matches MAX_MATCHES] [--debugger] [--no-debug-python]
63+
[--max-matches MAX_MATCHES] [--debugger]
64+
[--eval-command EVAL_COMMAND] [--no-debug-python]
4365
[--quiet | --debug | --trace] [--version] [-dumpversion]
4466
[FILE]
4567
@@ -48,43 +70,46 @@ A utility to recursively map the structure of a file.
4870
positional arguments:
4971
FILE the file to analyze; pass '-' or omit to read from STDIN
5072
51-
optional arguments:
73+
options:
5274
-h, --help show this help message and exit
53-
--format {mime,html,json,sbud}, -r {mime,html,json,sbud}
75+
--format {file,mime,html,json,sbud}, -r {file,mime,html,json,sbud}
5476
PolyFile's output format
55-
77+
5678
Output formats are:
57-
mime ... the detected MIME types associated with the file,
58-
like the output of the `file` command
59-
html ... an interactive HTML-based hex viewer
60-
json ... a modified version of the SBUD format in JSON syntax
61-
sbud ... equivalent to 'json'
62-
79+
file ...... the detected formats associated with the file,
80+
like the output of the `file` command
81+
mime ...... the detected MIME types associated with the file,
82+
like the output of the `file --mime-type` command
83+
explain ... like 'mime', but adds a human-readable explanation
84+
for why each MIME type matched
85+
html ...... an interactive HTML-based hex viewer
86+
json ...... a modified version of the SBUD format in JSON syntax
87+
sbud ...... equivalent to 'json'
88+
6389
Multiple formats can be output at once:
64-
90+
6591
polyfile INPUT_FILE -f mime -f json
66-
92+
6793
Their output will be concatenated to STDOUT in the order that
6894
they occur in the arguments.
69-
95+
7096
To save each format to a separate file, see the `--output` argument.
71-
72-
If no format is specified, PolyFile defaults to `--format sbud`,
73-
but this will change to `--format mime` in v0.5.0
97+
98+
If no format is specified, PolyFile defaults to `--format file`
7499
--output OUTPUT, -o OUTPUT
75100
an optional output path for `--format`
76-
101+
77102
Each instance of `--output` applies to the previous instance
78103
of the `--format` option.
79-
104+
80105
For example:
81-
106+
82107
polyfile INPUT_FILE --format html --output output.html \
83108
--format sbud --output output.json
84-
109+
85110
will save HTML to to `output.html` and SBUD to `output.json`.
86111
No two outputs can be directed at the same file path.
87-
112+
88113
The path can be '-' for STDOUT.
89114
If an `--output` is omitted for a format,
90115
then it will implicitly be printed to STDOUT.
@@ -93,6 +118,7 @@ optional arguments:
93118
--list, -l list the supported filetypes for the `--filetype` argument and exit
94119
--html HTML, -t HTML path to write an interactive HTML file for exploring the PDF;
95120
equivalent to `--format html --output HTML`
121+
--explain equivalent to `--format explain
96122
--only-match-mime, -I
97123
"just print out the matching MIME types for the file, one on each line;
98124
equivalent to `--format mime`
@@ -101,6 +127,8 @@ optional arguments:
101127
--max-matches MAX_MATCHES
102128
stop scanning after having found this many matches
103129
--debugger, -db drop into an interactive debugger for libmagic file definition matching and PolyFile parsing
130+
--eval-command EVAL_COMMAND, -ex EVAL_COMMAND
131+
execute the given debugger command
104132
--no-debug-python by default, the `--debugger` option will break on custom matchers and prompt to debug using PDB. This option will suppress those prompts.
105133
--quiet, -q suppress all log output
106134
--debug, -d print debug information
@@ -109,17 +137,6 @@ optional arguments:
109137
-dumpversion print PolyFile's raw version information to STDOUT and exit
110138
```
111139

112-
To generate a JSON mapping of a file, run:
113-
114-
```
115-
polyfile INPUT_FILE > output.json
116-
```
117-
118-
You can optionally have PolyFile output an interactive HTML page containing a labeled, interactive hexdump of the file:
119-
```
120-
polyfile INPUT_FILE --html output.html > output.json
121-
```
122-
123140
### Interactive Debugger
124141

125142
PolyFile has an interactive debugger both for its file matching and parsing. It can be used to debug a libmagic pattern
@@ -140,7 +157,7 @@ It currently has support for parsing and semantically mapping the following form
140157

141158
For an example that exercises all of these file formats, run:
142159
```bash
143-
curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - > ESultanikResume.json
160+
curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html -
144161
```
145162

146163
Prior to PolyFile version 0.3.0, it used the [TrID database](http://mark0.net/soft-trid-deflist.html) for file
@@ -150,13 +167,7 @@ TrID matching code is still shipped with PolyFile and can be invoked programmati
150167

151168
### Output Format
152169

153-
PolyFile outputs its mapping in an extension of the [SBuD](https://github.com/corkami/sbud) JSON format described [in the documentation](docs/json_format.md).
154-
155-
PolyFile can also emit a standalone HTML document that contains an interactive hex viewer as well as syntax trees for
156-
the discovered file formats. Simply pass the `--html` argument to PolyFile with an output path:
157-
```console
158-
$ polyfile input_file --html output.html
159-
```
170+
PolyFile has several options for outputting its results, specified by its `--format` option. For computer-readable output, PolyFile has an extension of the [SBuD](https://github.com/corkami/sbud) JSON format described [in the documentation](docs/json_format.md). Prior to version 0.5.0 this was the default output format of PolyFile. However, now the default output format is to mimic the behavior of the `file` command. To maintain the original behavior, use the `--format sbud` option.
160171

161172
### libMagic Implementation
162173

polyfile/__main__.py

Lines changed: 13 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,7 @@ def __exit__(self, exc_type, exc_val, exc_tb):
5858

5959
class FormatOutput:
6060
valid_formats = ("mime", "html", "json", "sbud", "explain")
61-
# TODO: Change this from "sbud" to "mime" in v0.5.0:
62-
default_format = "sbud"
61+
default_format = "file"
6362

6463
def __init__(self, output_format: Optional[str] = None, output_path: Optional[str] = None):
6564
if output_format is None:
@@ -144,8 +143,7 @@ def main(argv=None):
144143
145144
To save each format to a separate file, see the `--output` argument.
146145
147-
If no format is specified, PolyFile defaults to `--format sbud`,
148-
but this will change to `--format file` in v0.5.0"""))
146+
If no format is specified, PolyFile defaults to `--format file`"""))
149147

150148
parser.add_argument('--output', '-o', action=ValidateOutput, type=str, # nargs=2,
151149
# metavar=(f"{{{','.join(ValidateOutput.valid_outputs)}}}", "PATH"),
@@ -304,16 +302,7 @@ def main(argv=None):
304302
stack.enter_context(debugger)
305303
elif args.no_debug_python:
306304
log.warning("Ignoring `--no-debug-python`; it can only be used with the --debugger option.")
307-
if not sys.stdout.isatty() or not sys.stdin.isatty():
308-
log.warning("""WARNING
309-
!!!!!!!
310-
The default output format for PolyFile will be changing in forthcoming release v0.5.0!
311-
Currently, the default output format is SBUD/JSON.
312-
In release v0.5.0, it will switch to the equivalent of the current `--format file` option.
313-
To preserve the original behavior, add the `--format sbud` command line option.
314-
Please update your scripts!
315-
316-
""")
305+
317306
analyzer = Analyzer(file_path, parse=not args.only_match, magic_matcher=magic_matcher)
318307

319308
needs_sbud = any(output_format.output_format in {"html", "json", "sbud"} for output_format in args.format)
@@ -339,14 +328,18 @@ def main(argv=None):
339328
with output_format.output_stream as output:
340329
if output_format.output_format == "file":
341330
istty = sys.stderr.isatty() and output.isatty() and logging.root.level <= logging.INFO
331+
lines = set()
342332
with KeyboardInterruptHandler():
343333
for match in analyzer.magic_matches():
344-
if istty:
345-
log.clear_status()
346-
output.write(f"{match!s}\n")
347-
output.flush()
348-
else:
349-
output.write(f"{match!s}\n")
334+
line = str(match)
335+
if line not in lines:
336+
lines.add(line)
337+
if istty:
338+
log.clear_status()
339+
output.write(f"{line}\n")
340+
output.flush()
341+
else:
342+
output.write(f"{line}\n")
350343
if istty:
351344
log.clear_status()
352345
elif output_format.output_format in ("mime", "explain"):

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
long_description_content_type="text/markdown",
2121
url='https://github.com/trailofbits/polyfile',
2222
author='Trail of Bits',
23-
version="0.4.2",
23+
version="0.5.0",
2424
packages=find_packages(exclude=("tests",)),
2525
python_requires='>=3.7',
2626
install_requires=[

0 commit comments

Comments
 (0)