Skip to content

Crash on malformed input #42

Open
Open
@tlby

Description

@tlby

Thank you for this work!

I discovered some poorly formed content that triggered an exception.

Example:

import sniffpy
buf = (
    b'\n\xef\xbb\xbf<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict'
    b'//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html'
    b' xmlns="http://www.w3.org/1999/xhtml">\n<html xmlns:fb="http://ww'
    b'w.facebook.com/2008/fbml">\n[...]\n</html>\r\n<!-- Performance op'
    b'timized by W3 Total Cache. Learn more: https://www.w3-edge.com/pr'
    b'oducts/\r\n\r\nPage Caching using disk: enhanced (SSL caching dis'
    b'abled)\r\n\r\n Served from: [...] @ 2017-03-29 16:04:56 by W3 Tot'
    b'al Cache -->'
)
sniffpy.sniff(buf)

produces:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/sniff.py", line 205, in sniff
    return sniff_unknown(resource, sniff_scriptable=not no_sniff)
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/sniff.py", line 106, in sniff_unknown
    mime_type = match.match_video_audio_type_pattern(resource)
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/match.py", line 93, in match_video_audio_type_pattern
    if is_mp3_pattern(resource):
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/match.py", line 132, in is_mp3_pattern
    if not match_mp3_header(resource, offset, parsed_values):
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/utils.py", line 49, in match_mp3_header
    parsed_values['layer'] = layer[0] >> 1
TypeError: 'int' object is not subscriptable

A few things are interesting to note about this content:

  • Byte order mark follows a newline
  • Line endings switch from \n to \r\n mid way
  • Multiple <html> tags
  • Mismatched open/close tags

This sample was found on the web (by commoncrawl.org in 2017), though I have replaced large sections with "[...]" that were not significant to the crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions