@@ -19,34 +19,40 @@ Introduction
1919============
2020
2121This API is mainly for Terminal Emulator implementors -- any python program
22- that attempts to determine the printable width of a string on a Terminal.
22+ that attempts to determine the printable width of a string on a Terminal. It
23+ is implemented in python (no C library calls) and has no 3rd-party dependencies.
2324
24- It is certainly possible to use your Operating System's ``wcwidth() `` and
25- ``wcswidth() `` calls if it is POSIX-conforming, but this would not be possible
25+ It is certainly possible to use your Operating System's ``wcwidth(3 ) `` and
26+ ``wcswidth(3 ) `` calls if it is POSIX-conforming, but this would not be possible
2627on non-POSIX platforms, such as Windows, or for alternative Python
27- implementations, such as jython.
28-
29- Furthermore, testing (`wcwidth-libc-comparator.py `_) has shown that libc
30- wcwidth() is particularly out of date on most operating systems, reporting -1
31- for a great many characters that are actually a displayable width of 1 or 2.
28+ implementations, such as jython. It is also commonly many releases older
29+ than the most current Unicode Standard release files, which this project
30+ aims to track.
3231
3332The most current release of this API is based from Unicode Standard release
34- _7.0.0_, dated 2014-02-28, 23:15:00 GMT [KW, LI]
33+ *7.0.0 *, dated *2014-02-28, 23:15:00 GMT [KW, LI] * for table generated by
34+ file ``EastAsianWidth-7.0.0.txt `` and *2014-02-07, 18:42:08 GMT [MD] * for
35+ ``DerivedCombiningClass-7.0.0.txt ``.
36+
37+ Installation
38+ ------------
39+
40+ The stable version of this package is maintained on pypi, install using pip::
41+
42+ pip install wcwidth
3543
3644Problem
3745-------
3846
3947You may have noticed some characters especially Chinese, Japanese, and
4048Korean (collectively known as the *CJK Unified Ideographs *) consume more
41- than 1 terminal cell.
42-
43- In python, if you ask for the length of the string, ``u'コンニチハ' ``
44- (Japanese: Hello), it is correctly determined to be a length of **5 **.
49+ than 1 terminal cell. If you ask for the length of the string, ``u'コンニチハ' ``
50+ (Japanese: Hello), it is correctly determined to be a length of **5 ** using
51+ the ``len() `` built-in.
4552
4653However, if you were to print this to a Terminal Emulator, such as xterm,
47- urxvt, Terminal.app, or PuTTY, it would consume **10 ** *cells * (columns) --
48- two for each symbol.
49-
54+ urxvt, Terminal.app, PuTTY, or iTerm2, it would consume **10 ** *cells * (columns).
55+ This causes problems for many of the text-alignment functions, such as ``rjust() ``.
5056On an 80-wide terminal, the following would wrap along the margin, instead
5157of displaying it right-aligned as desired::
5258
@@ -65,17 +71,10 @@ that the length of ``wcwidth(u'コ')`` is reported as ``2``, and
6571This allows one to determine the printable effects of displaying *CJK *
6672characters on a terminal emulator.
6773
68- Installation
69- ------------
70-
71- The stable version of this package is maintained on pypi, install using pip::
72-
73- pip install wcwidth
74-
7574wcwidth, wcswidth
7675-----------------
77- Use ``wcwidth `` to determine the length of a single character,
78- and ``wcswidth `` to determine the length of a string of characters.
76+ Use ``wcwidth `` to determine the length of a * single character * ,
77+ and ``wcswidth `` to determine the length of a * string of characters * .
7978
8079To Display ``u'コンニチハ' `` right-adjusted on screen of 80 columns::
8180
@@ -88,9 +87,9 @@ To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns::
8887Values
8988------
9089
91- See the docstring for `` wcwidth() ``, general overview of return values:
90+ A general overview of return values:
9291
93- - ``-1 ``: indeterminate, such as combining _ characters .
92+ - ``-1 ``: indeterminate (see Todo _) .
9493
9594 - ``0 ``: do not advance the cursor, such as NULL.
9695
@@ -99,12 +98,37 @@ See the docstring for ``wcwidth()``, general overview of return values:
9998 - ``1 ``: all others.
10099
101100``wcswidth() `` simply returns the sum of all values along a string, or
102- ``-1 `` if it has occurred for any value returned by ``wcwidth() ``.
101+ ``-1 `` if it has occurred for any value returned by ``wcwidth() ``. A more
102+ exacting list of conditions and return values may be found in the docstring
103+ for ``wcwidth() ``.
104+
105+ Discrepacies
106+ ------------
107+
108+ There may be discrepancies with the determined printable width of of characters
109+ by *wcwidth * and the results of any given terminal emulator -- most commonly,
110+ emulators are using your Operating System's ``wcwidth(3) `` implementation which
111+ is often based on tables much older than the most current Unicode Specification.
112+ Python's determination of non-zero combining _ characters may also be based on an
113+ older specification.
114+
115+ You may determine an exacting list of these discrepancies using files
116+ `wcwidth-libc-comparator.py `_ and `wcwidth-combining-comparator.py `_
117+
118+ .. _`wcwidth-libc-comparator.py` : https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py
119+ .. _`wcwidth-combining-comparator.py` : https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-combining-comparator.py
120+
103121
104122==========
105123Developing
106124==========
107125
126+ Execute the command ``python setup.py develop `` to prepare an environment
127+ for running tests (``python setup.py test ``), updating tables (
128+ ``python setup.py update ``) or using any of the scripts in the ``bin/ ``
129+ sub-folder. These files are only made available in the source repository.
130+
131+
108132Updating Tables
109133---------------
110134
@@ -113,7 +137,10 @@ The command ``python setup.py update`` will fetch the following resources:
113137 - http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
114138 - http://www.unicode.org/Public/UNIDATA/extracted/DerivedCombiningClass.txt
115139
116- Generating the table files `wcwidth/table_wide.py `_ and `wcwidth/table_comb.py `_.
140+ And generate the table files `wcwidth/table_wide.py `_ and `wcwidth/table_comb.py `_.
141+
142+ .. _`wcwidth/table_wide.py` : https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py
143+ .. _`wcwidth/table_comb.py` : https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py
117144
118145wcwidth.c
119146---------
@@ -122,9 +149,8 @@ This code was originally derived directly from C code of the same name,
122149whose latest version is available at: `wcwidth.c `_ And is authored by
123150Markus Kuhn -- 2007-05-26 (Unicode 5.0)
124151
125- Any subsequent changes were done by directly testing against the various libc
126- implementations of POSIX-compliant Operating Systems, such as Mac OSX, Linux,
127- and OpenSolaris.
152+ .. _`wcwidth.c` : http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
153+
128154
129155Examples
130156--------
@@ -133,17 +159,24 @@ This library is used in:
133159
134160- `jquast/blessed `_, a simplified wrapper around curses.
135161
136- - `jonathanslenders/python-prompt-toolkit `_, a Library for building powerful interactive command lines in Python.
162+ - `jonathanslenders/python-prompt-toolkit `_, a Library for building powerful
163+ interactive command lines in Python.
137164
138165Additional tools for displaying and testing wcwidth is found in the ``bin/ ``
139166folder of this project (github link: `wcwidth/bin `_). They are not distributed
140167as a script or part of the module.
141168
169+ .. _`jquast/blessed` : https://github.com/jquast/blessed
170+ .. _`jonathanslenders/python-prompt-toolkit` : https://github.com/jonathanslenders/python-prompt-toolkit
171+ .. _`wcwidth/bin` : https://github.com/jquast/wcwidth/tree/master/bin
172+
142173Todo
143174----
144175
145- It is my wish that `combining `_ characters are understood. Currently,
146- any string containing combining characters will always return ``-1 ``.
176+ Though some of the most common ("zero-width") `combining `_ characters
177+ are understood by wcswidth, there are still many edge cases that need
178+ to be covered, especially certain kinds of sequences such as those
179+ containing Control-Sequence-Inducer (CSI).
147180
148181
149182License
@@ -181,31 +214,33 @@ an OSI-approved license that appears most-alike has been chosen, the MIT license
181214 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
182215 THE SOFTWARE.
183216
184- .. _`jquast/blessed` : https://github.com/jquast/blessed
185- .. _`jonathanslenders/python-prompt-toolkit` : https://github.com/jonathanslenders/python-prompt-toolkit
186- .. _`wcwidth/bin` : https://github.com/jquast/wcwidth/tree/master/bin
187- .. _`wcwidth-libc-comparator.py` : https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-libc-comparator.py
188- .. _`wcwidth/table_wide.py` : https://github.com/jquast/wcwidth/tree/master/wcwidth/table_wide.py
189- .. _`wcwidth/table_comb.py` : https://github.com/jquast/wcwidth/tree/master/wcwidth/table_comb.py
190- .. _`combining` : https://en.wikipedia.org/wiki/Combining_character
191- .. _`wcwidth.c` : http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
192-
193217Changes
194218-------
195219
1962200.1.4
221+ * **Feature **: ``wcswidth() `` now determines printable length
222+ for (most) combining characters. The developer's tool
223+ `bin/wcwidth-browser.py `_ is improved to display combining _
224+ characters when provided the ``--combining `` option
225+ (`Thomas Ballinger `_ and `Leta Montopoli `_ `PR #5 `_).
197226 * added static analysis (prospector _) to testing framework.
198227
1992280.1.3
200- * *Bugfix *: 2nd parameter of wcswidth was not honored.
201- (`thomasballinger `_ PR #4).
229+ * ** Bugfix * *: 2nd parameter of wcswidth was not honored.
230+ (`Thomas Ballinger `_, ` PR #4 ` ).
202231
2032320.1.2
204- * Updated tables to Unicode Specification 7.0.0
205- (`thomasballinger `_ PR #3).
233+ * ** Updated ** tables to Unicode Specification 7.0.0.
234+ (`Thomas Ballinger `_, ` PR #3 ` ).
206235
2072360.1.1
208237 * Initial release to pypi, Based on Unicode Specification 6.3.0
209238
210- .. _`thomasballinger` : https://github.com/thomasballinger
211239.. _`prospector` : https://github.com/landscapeio/prospector
240+ .. _`combining` : https://en.wikipedia.org/wiki/Combining_character
241+ .. _`bin/wcwidth-browser.py` : https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
242+ .. _`Thomas Ballinger` : https://github.com/thomasballinger
243+ .. _`Leta Montopoli` : https://github.com/lmontopo
244+ .. _`PR #3` : https://github.com/jquast/wcwidth/pull/3
245+ .. _`PR #4` : https://github.com/jquast/wcwidth/pull/4
246+ .. _`PR #5` : https://github.com/jquast/wcwidth/pull/5
0 commit comments