Skip to content

Commit eb5e142

Browse files
committed
Merge branch 'merge-combining-browser'
2 parents d5bd1ac + fb28009 commit eb5e142

File tree

6 files changed

+216
-111
lines changed

6 files changed

+216
-111
lines changed

README.rst

Lines changed: 84 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -19,34 +19,40 @@ Introduction
1919
============
2020

2121
This API is mainly for Terminal Emulator implementors -- any python program
22-
that attempts to determine the printable width of a string on a Terminal.
22+
that attempts to determine the printable width of a string on a Terminal. It
23+
is implemented in python (no C library calls) and has no 3rd-party dependencies.
2324

24-
It is certainly possible to use your Operating System's ``wcwidth()`` and
25-
``wcswidth()`` calls if it is POSIX-conforming, but this would not be possible
25+
It is certainly possible to use your Operating System's ``wcwidth(3)`` and
26+
``wcswidth(3)`` calls if it is POSIX-conforming, but this would not be possible
2627
on non-POSIX platforms, such as Windows, or for alternative Python
27-
implementations, such as jython.
28-
29-
Furthermore, testing (`wcwidth-libc-comparator.py`_) has shown that libc
30-
wcwidth() is particularly out of date on most operating systems, reporting -1
31-
for a great many characters that are actually a displayable width of 1 or 2.
28+
implementations, such as jython. It is also commonly many releases older
29+
than the most current Unicode Standard release files, which this project
30+
aims to track.
3231

3332
The most current release of this API is based from Unicode Standard release
34-
_7.0.0_, dated 2014-02-28, 23:15:00 GMT [KW, LI]
33+
*7.0.0*, dated *2014-02-28, 23:15:00 GMT [KW, LI]* for table generated by
34+
file ``EastAsianWidth-7.0.0.txt`` and *2014-02-07, 18:42:08 GMT [MD]* for
35+
``DerivedCombiningClass-7.0.0.txt``.
36+
37+
Installation
38+
------------
39+
40+
The stable version of this package is maintained on pypi, install using pip::
41+
42+
pip install wcwidth
3543

3644
Problem
3745
-------
3846

3947
You may have noticed some characters especially Chinese, Japanese, and
4048
Korean (collectively known as the *CJK Unified Ideographs*) consume more
41-
than 1 terminal cell.
42-
43-
In python, if you ask for the length of the string, ``u'コンニチハ'``
44-
(Japanese: Hello), it is correctly determined to be a length of **5**.
49+
than 1 terminal cell. If you ask for the length of the string, ``u'コンニチハ'``
50+
(Japanese: Hello), it is correctly determined to be a length of **5** using
51+
the ``len()`` built-in.
4552

4653
However, if you were to print this to a Terminal Emulator, such as xterm,
47-
urxvt, Terminal.app, or PuTTY, it would consume **10** *cells* (columns) --
48-
two for each symbol.
49-
54+
urxvt, Terminal.app, PuTTY, or iTerm2, it would consume **10** *cells* (columns).
55+
This causes problems for many of the text-alignment functions, such as ``rjust()``.
5056
On an 80-wide terminal, the following would wrap along the margin, instead
5157
of displaying it right-aligned as desired::
5258

@@ -65,17 +71,10 @@ that the length of ``wcwidth(u'コ')`` is reported as ``2``, and
6571
This allows one to determine the printable effects of displaying *CJK*
6672
characters on a terminal emulator.
6773

68-
Installation
69-
------------
70-
71-
The stable version of this package is maintained on pypi, install using pip::
72-
73-
pip install wcwidth
74-
7574
wcwidth, wcswidth
7675
-----------------
77-
Use ``wcwidth`` to determine the length of a single character,
78-
and ``wcswidth`` to determine the length of a string of characters.
76+
Use ``wcwidth`` to determine the length of a *single character*,
77+
and ``wcswidth`` to determine the length of a *string of characters*.
7978

8079
To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns::
8180

@@ -88,9 +87,9 @@ To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns::
8887
Values
8988
------
9089

91-
See the docstring for ``wcwidth()``, general overview of return values:
90+
A general overview of return values:
9291

93-
- ``-1``: indeterminate, such as combining_ characters.
92+
- ``-1``: indeterminate (see Todo_).
9493

9594
- ``0``: do not advance the cursor, such as NULL.
9695

@@ -99,12 +98,37 @@ See the docstring for ``wcwidth()``, general overview of return values:
9998
- ``1``: all others.
10099

101100
``wcswidth()`` simply returns the sum of all values along a string, or
102-
``-1`` if it has occurred for any value returned by ``wcwidth()``.
101+
``-1`` if it has occurred for any value returned by ``wcwidth()``. A more
102+
exacting list of conditions and return values may be found in the docstring
103+
for ``wcwidth()``.
104+
105+
Discrepacies
106+
------------
107+
108+
There may be discrepancies with the determined printable width of of characters
109+
by *wcwidth* and the results of any given terminal emulator -- most commonly,
110+
emulators are using your Operating System's ``wcwidth(3)`` implementation which
111+
is often based on tables much older than the most current Unicode Specification.
112+
Python's determination of non-zero combining_ characters may also be based on an
113+
older specification.
114+
115+
You may determine an exacting list of these discrepancies using files
116+
`wcwidth-libc-comparator.py`_ and `wcwidth-combining-comparator.py`_
117+
118+
.. _`wcwidth-libc-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py
119+
.. _`wcwidth-combining-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-combining-comparator.py
120+
103121

104122
==========
105123
Developing
106124
==========
107125

126+
Execute the command ``python setup.py develop`` to prepare an environment
127+
for running tests (``python setup.py test``), updating tables (
128+
``python setup.py update``) or using any of the scripts in the ``bin/``
129+
sub-folder. These files are only made available in the source repository.
130+
131+
108132
Updating Tables
109133
---------------
110134

@@ -113,7 +137,10 @@ The command ``python setup.py update`` will fetch the following resources:
113137
- http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
114138
- http://www.unicode.org/Public/UNIDATA/extracted/DerivedCombiningClass.txt
115139

116-
Generating the table files `wcwidth/table_wide.py`_ and `wcwidth/table_comb.py`_.
140+
And generate the table files `wcwidth/table_wide.py`_ and `wcwidth/table_comb.py`_.
141+
142+
.. _`wcwidth/table_wide.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py
143+
.. _`wcwidth/table_comb.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py
117144

118145
wcwidth.c
119146
---------
@@ -122,9 +149,8 @@ This code was originally derived directly from C code of the same name,
122149
whose latest version is available at: `wcwidth.c`_ And is authored by
123150
Markus Kuhn -- 2007-05-26 (Unicode 5.0)
124151

125-
Any subsequent changes were done by directly testing against the various libc
126-
implementations of POSIX-compliant Operating Systems, such as Mac OSX, Linux,
127-
and OpenSolaris.
152+
.. _`wcwidth.c`: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
153+
128154

129155
Examples
130156
--------
@@ -133,17 +159,24 @@ This library is used in:
133159

134160
- `jquast/blessed`_, a simplified wrapper around curses.
135161

136-
- `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful interactive command lines in Python.
162+
- `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful
163+
interactive command lines in Python.
137164

138165
Additional tools for displaying and testing wcwidth is found in the ``bin/``
139166
folder of this project (github link: `wcwidth/bin`_). They are not distributed
140167
as a script or part of the module.
141168

169+
.. _`jquast/blessed`: https://github.com/jquast/blessed
170+
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
171+
.. _`wcwidth/bin`: https://github.com/jquast/wcwidth/tree/master/bin
172+
142173
Todo
143174
----
144175

145-
It is my wish that `combining`_ characters are understood. Currently,
146-
any string containing combining characters will always return ``-1``.
176+
Though some of the most common ("zero-width") `combining`_ characters
177+
are understood by wcswidth, there are still many edge cases that need
178+
to be covered, especially certain kinds of sequences such as those
179+
containing Control-Sequence-Inducer (CSI).
147180

148181

149182
License
@@ -181,31 +214,33 @@ an OSI-approved license that appears most-alike has been chosen, the MIT license
181214
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
182215
THE SOFTWARE.
183216

184-
.. _`jquast/blessed`: https://github.com/jquast/blessed
185-
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
186-
.. _`wcwidth/bin`: https://github.com/jquast/wcwidth/tree/master/bin
187-
.. _`wcwidth-libc-comparator.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py
188-
.. _`wcwidth/table_wide.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py
189-
.. _`wcwidth/table_comb.py`: https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py
190-
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
191-
.. _`wcwidth.c`: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
192-
193217
Changes
194218
-------
195219

196220
0.1.4
221+
* **Feature**: ``wcswidth()`` now determines printable length
222+
for (most) combining characters. The developer's tool
223+
`bin/wcwidth-browser.py`_ is improved to display combining_
224+
characters when provided the ``--combining`` option
225+
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
197226
* added static analysis (prospector_) to testing framework.
198227

199228
0.1.3
200-
* *Bugfix*: 2nd parameter of wcswidth was not honored.
201-
(`thomasballinger`_ PR #4).
229+
* **Bugfix**: 2nd parameter of wcswidth was not honored.
230+
(`Thomas Ballinger`_, `PR #4`).
202231

203232
0.1.2
204-
* Updated tables to Unicode Specification 7.0.0
205-
(`thomasballinger`_ PR #3).
233+
* **Updated** tables to Unicode Specification 7.0.0.
234+
(`Thomas Ballinger`_, `PR #3`).
206235

207236
0.1.1
208237
* Initial release to pypi, Based on Unicode Specification 6.3.0
209238

210-
.. _`thomasballinger`: https://github.com/thomasballinger
211239
.. _`prospector`: https://github.com/landscapeio/prospector
240+
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
241+
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
242+
.. _`Thomas Ballinger`: https://github.com/thomasballinger
243+
.. _`Leta Montopoli`: https://github.com/lmontopo
244+
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
245+
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
246+
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5

0 commit comments

Comments
 (0)