Skip to content

argparse's FileType argument causes UnicodeDecodeError on CLI parsing of .msg files #304

@jeremybmerrill

Description

@jeremybmerrill

Describe the bug
When parsing .msg files in Python 3.9, I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte.

To Reproduce
Steps to reproduce the behavior:

  1. In Python 3.9, with msg_parser installed
  2. msg_parser -i path_to_my_msgfile.msg -e .
  3. Observe error

Expected behavior
No error!

Desktop (please complete the following information):

  • OS: Mac OS X
  • Python: 3.9.13
  • Version: msg_parser @ d16260d

Additional context

I was able to fix the problem by removing https://github.com/vikramarsid/msg_parser/blob/master/msg_parser/cli.py#L40; after that, everything worked fine. Evidently, the argparse FileType argument tries to open the file as utf-8, which it is not. The problem can also be fixed by changing the line to specify that the file is binary, type=FileType(mode="rb"),

Happy to submit a PR, but I cannot test if the type=Filetype() line is expected to do something in particular. As with #303 , I cannot submit any test files because all of my .msg files are confidential.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions