Open
Description
Thank you for this work!
I discovered some poorly formed content that triggered an exception.
Example:
import sniffpy
buf = (
b'\n\xef\xbb\xbf<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict'
b'//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html'
b' xmlns="http://www.w3.org/1999/xhtml">\n<html xmlns:fb="http://ww'
b'w.facebook.com/2008/fbml">\n[...]\n</html>\r\n<!-- Performance op'
b'timized by W3 Total Cache. Learn more: https://www.w3-edge.com/pr'
b'oducts/\r\n\r\nPage Caching using disk: enhanced (SSL caching dis'
b'abled)\r\n\r\n Served from: [...] @ 2017-03-29 16:04:56 by W3 Tot'
b'al Cache -->'
)
sniffpy.sniff(buf)
produces:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/share/github.com/codeprentice-org/sniffpy/sniffpy/sniff.py", line 205, in sniff
return sniff_unknown(resource, sniff_scriptable=not no_sniff)
File "/share/github.com/codeprentice-org/sniffpy/sniffpy/sniff.py", line 106, in sniff_unknown
mime_type = match.match_video_audio_type_pattern(resource)
File "/share/github.com/codeprentice-org/sniffpy/sniffpy/match.py", line 93, in match_video_audio_type_pattern
if is_mp3_pattern(resource):
File "/share/github.com/codeprentice-org/sniffpy/sniffpy/match.py", line 132, in is_mp3_pattern
if not match_mp3_header(resource, offset, parsed_values):
File "/share/github.com/codeprentice-org/sniffpy/sniffpy/utils.py", line 49, in match_mp3_header
parsed_values['layer'] = layer[0] >> 1
TypeError: 'int' object is not subscriptable
A few things are interesting to note about this content:
- Byte order mark follows a newline
- Line endings switch from
\n
to\r\n
mid way - Multiple
<html>
tags - Mismatched open/close tags
This sample was found on the web (by commoncrawl.org in 2017), though I have replaced large sections with "[...]" that were not significant to the crash.
Metadata
Metadata
Assignees
Labels
No labels