Skip to content

Format strings should be unicode when value can be unicode #2133

Open
@michamos

Description

@michamos

A typical cause of unicode errors is to have a non-unicode format string with a unicode value:

In [1]: def broken_format(value):
   ...:     return 'Value: {}'.format(value)
   ...: 

In [2]: broken_format('hello')
Out[2]: 'Value: hello'

In [3]: broken_format(u'καλημέρα')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-3-f592da2d7e68> in <module>()
----> 1 broken_format(u'καλημέρα')

<ipython-input-1-6aee3f447981> in broken_format(value)
      1 def broken_format(value):
----> 2     return 'Value: {}'.format(value)

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

In [4]: def good_format(value):
   ...:     return u'Value: {}'.format(value)
   ...: 

In [5]: good_format(u'καλημέρα')
Out[5]: u'Value: \u03ba\u03b1\u03bb\u03b7\u03bc\u03ad\u03c1\u03b1'

Instead of fixing these one by one as they pop up on sentry (see #2026, #2030, #2107, #2110, #2116, #2123, #2127), it would be better to do a systematic check to solve all of them once and for all.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions