Skip to content

Can't process django.po file "invalid byte sequence in UTF-8" #285

@anentropic

Description

@anentropic

Twine version 1.0.6

$ twine consume-all-localization-files twine.txt locale/ --consume-all --consume-comments --format=django
Traceback (most recent call last):
	13: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `<main>'
	12: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `eval'
	11: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `<main>'
	10: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `load'
	 9: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/bin/twine:4:in `<top (required)>'
	 8: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:33:in `run'
	 7: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:190:in `consume_all_localization_files'
	 6: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:190:in `glob'
	 5: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:193:in `block in consume_all_localization_files'
	 4: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:323:in `read_localization_file'
	 3: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:323:in `open'
	 2: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:325:in `block in read_localization_file'
	 1: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/formatters/django.rb:22:in `read'
/Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/formatters/django.rb:22:in `match': invalid byte sequence in UTF-8 (ArgumentError)

Unfortunately the unhandled exception does not give any information about the location of the bad char within the file.

We're using these .po files fine in our Django project so I'm not sure they really contain any wrongly encoded data.

At the top of the file there's an entry like:

msgid ""
msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

if I use --encoding=ASCII-8BIT:

twine consume-all-localization-files twine.txt garage/locale/ --consume-all --consume-comments --format=django --encoding=ASCII-8BIT

then it logs Adding new definition <msg id> for all the messages in the .po but fails when writing result to the twine.txt with this error:

Traceback (most recent call last):
	19: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `<main>'
	18: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/ruby_executable_hooks:24:in `eval'
	17: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `<main>'
	16: from /Users/anentropic/.rvm/gems/ruby-2.6.3/bin/twine:23:in `load'
	15: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/bin/twine:4:in `<top (required)>'
	14: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:33:in `run'
	13: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:201:in `consume_all_localization_files'
	12: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/runner.rb:55:in `write_twine_data'
	11: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:180:in `write'
	10: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:180:in `open'
	 9: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:181:in `block in write'
	 8: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:181:in `each'
	 7: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:188:in `block (2 levels) in write'
	 6: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:188:in `each'
	 5: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:206:in `block (3 levels) in write'
	 4: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:206:in `each'
	 3: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:207:in `block (4 levels) in write'
	 2: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:224:in `write_value'
	 1: from /Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:224:in `puts'
/Users/anentropic/.rvm/gems/ruby-2.6.3/gems/twine-1.0.6/lib/twine/twine_file.rb:224:in `write': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

If I modify django.rb in twine like:

        while line = io.gets
          if line != nil
            line = line.scrub("BADCHAR")
          end

...then I'm able to get complete output in my twine.txt file with no errors.

Curiously the replacement BADCHAR does not appear anywhere in the output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions