Skip to content

gh-69619: Add whitespace term to glossary and reference in stdtypes.rst #132568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions Doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1443,6 +1443,32 @@ Glossary
A computer defined entirely in software. Python's virtual machine
executes the :term:`bytecode` emitted by the bytecode compiler.

whitespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep this glossary entry (see other comments), it should mention Unicode first, and reduce the table to an in-line description (see the entry for bytes.isspace()) to take up less space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suggested the table. I didn't realize there was precedent for the inline format.

I find the table significantly easier to read, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glossary page is very long, we should avoid making it longer. Perhaps split up the characters though, eg " (space), \t (horizontal tab), ...".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it bad for the glossary to be long? I don't think people read it in order, they just click on terms elsewhere and get redirected. I would think that users prefer more information on individual terms rather than the overall glossary page being short.

Copy link
Member

@AA-Turner AA-Turner Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not bad for it to be long, but rather longer than it needs to be. A full table here isn't needed to describe six characters, and as mentioned it takes the focus away from Unicode whitespace, which is the default set of whitespace operated on, unless using bytes/buffer functions, or re.ASCII. The more common thing (Unicode) should be the focus, and we should avoid giving readers the expectation that whitespace is limited to the ASCII set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, fair point. Maybe there's a better way to emphasize Unicode here? I'm really not a fan of the inline version based on bytes.isspace.

Characters that represent horizontal or vertical space.
In an ASCII context, Python recognizes these characters as whitespace:

+-----------+-----------------+
| ``' '`` | space |
+-----------+-----------------+
| ``'\t'`` | tab |
+-----------+-----------------+
| ``'\n'`` | newline |
+-----------+-----------------+
| ``'\v'`` | vertical tab |
+-----------+-----------------+
| ``'\f'`` | form feed |
+-----------+-----------------+
| ``'\r'`` | carriage return |
+-----------+-----------------+

In a Unicode context, whitespace characters are the
characters defined in the `Unicode Character Database
<https://www.unicode.org/reports/tr44/>`_ as "Other" or "Separator"
and those with bidirectional property being one of "WS," "B," or "S."

For example, this is used to :meth:`~str.split` or :meth:`~str.strip`
strings.

Zen of Python
Listing of Python design principles and philosophies that are helpful in
understanding and using the language. The listing can be found by typing
Expand Down
50 changes: 26 additions & 24 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2092,8 +2092,9 @@ expression support in the :mod:`re` module).

Return a copy of the string with leading characters removed. The *chars*
argument is a string specifying the set of characters to be removed. If omitted
or ``None``, the *chars* argument defaults to removing whitespace. The *chars*
argument is not a prefix; rather, all combinations of its values are stripped::
or ``None``, the *chars* argument defaults to removing :term:`whitespace`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll yield, but I think a glossary term is the right way to go here. isspace() methods should document quirks with the methods themselves, not necessarily provide the definition for whitespace.

The *chars* argument is not a prefix; rather, all combinations of its values
are stripped::

>>> ' spacious '.lstrip()
'spacious '
Expand Down Expand Up @@ -2211,8 +2212,9 @@ expression support in the :mod:`re` module).

Return a copy of the string with trailing characters removed. The *chars*
argument is a string specifying the set of characters to be removed. If omitted
or ``None``, the *chars* argument defaults to removing whitespace. The *chars*
argument is not a suffix; rather, all combinations of its values are stripped::
or ``None``, the *chars* argument defaults to removing :term:`whitespace`.
The *chars* argument is not a suffix; rather, all combinations of its values
are stripped::

>>> ' spacious '.rstrip()
' spacious'
Expand Down Expand Up @@ -2348,9 +2350,9 @@ expression support in the :mod:`re` module).

Return a copy of the string with the leading and trailing characters removed.
The *chars* argument is a string specifying the set of characters to be removed.
If omitted or ``None``, the *chars* argument defaults to removing whitespace.
The *chars* argument is not a prefix or suffix; rather, all combinations of its
values are stripped::
If omitted or ``None``, the *chars* argument defaults to removing
:term:`whitespace`. The *chars* argument is not a prefix or suffix; rather,
all combinations of its values are stripped::

>>> ' spacious '.strip()
'spacious'
Expand Down Expand Up @@ -2735,7 +2737,7 @@ data and are closely related to string objects in a variety of other ways.

This :class:`bytes` class method returns a bytes object, decoding the
given string object. The string must contain two hexadecimal digits per
byte, with ASCII whitespace being ignored.
byte, with :term:`ASCII whitespace <whitespace>` being ignored.

>>> bytes.fromhex('2Ef0 F1f2 ')
b'.\xf0\xf1\xf2'
Expand Down Expand Up @@ -2824,7 +2826,7 @@ objects.

This :class:`bytearray` class method returns bytearray object, decoding
the given string object. The string must contain two hexadecimal digits
per byte, with ASCII whitespace being ignored.
per byte, with :term:`ASCII whitespace <whitespace>` being ignored.

>>> bytearray.fromhex('2Ef0 F1f2 ')
bytearray(b'.\xf0\xf1\xf2')
Expand Down Expand Up @@ -3243,8 +3245,8 @@ produce new objects.
*chars* argument is a binary sequence specifying the set of byte values to
be removed - the name refers to the fact this method is usually used with
ASCII characters. If omitted or ``None``, the *chars* argument defaults
to removing ASCII whitespace. The *chars* argument is not a prefix;
rather, all combinations of its values are stripped::
to removing :term:`ASCII whitespace <whitespace>`. The *chars* argument is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a prefix; rather, all combinations of its values are stripped::

>>> b' spacious '.lstrip()
b'spacious '
Expand Down Expand Up @@ -3287,9 +3289,9 @@ produce new objects.
Split the binary sequence into subsequences of the same type, using *sep*
as the delimiter string. If *maxsplit* is given, at most *maxsplit* splits
are done, the *rightmost* ones. If *sep* is not specified or ``None``,
any subsequence consisting solely of ASCII whitespace is a separator.
Except for splitting from the right, :meth:`rsplit` behaves like
:meth:`split` which is described in detail below.
any subsequence consisting solely of :term:`ASCII whitespace <whitespace>`
is a separator. Except for splitting from the right, :meth:`rsplit` behaves
like :meth:`split` which is described in detail below.


.. method:: bytes.rstrip([chars])
Expand All @@ -3299,8 +3301,8 @@ produce new objects.
*chars* argument is a binary sequence specifying the set of byte values to
be removed - the name refers to the fact this method is usually used with
ASCII characters. If omitted or ``None``, the *chars* argument defaults to
removing ASCII whitespace. The *chars* argument is not a suffix; rather,
all combinations of its values are stripped::
removing :term:`ASCII whitespace <whitespace>`. The *chars* argument is not
a suffix; rather, all combinations of its values are stripped::

>>> b' spacious '.rstrip()
b' spacious'
Expand Down Expand Up @@ -3352,7 +3354,8 @@ produce new objects.
[b'1', b'2', b'3<4']

If *sep* is not specified or is ``None``, a different splitting algorithm
is applied: runs of consecutive ASCII whitespace are regarded as a single
is applied: runs of consecutive :term:`ASCII whitespace <whitespace>` are
regarded as a single
separator, and the result will contain no empty strings at the start or
end if the sequence has leading or trailing whitespace. Consequently,
splitting an empty sequence or a sequence consisting solely of ASCII
Expand All @@ -3376,9 +3379,9 @@ produce new objects.
removed. The *chars* argument is a binary sequence specifying the set of
byte values to be removed - the name refers to the fact this method is
usually used with ASCII characters. If omitted or ``None``, the *chars*
argument defaults to removing ASCII whitespace. The *chars* argument is
not a prefix or suffix; rather, all combinations of its values are
stripped::
argument defaults to removing :term:`ASCII whitespace <whitespace>`.
The *chars* argument is not a prefix or suffix; rather, all combinations of
its values are stripped::

>>> b' spacious '.strip()
b'spacious'
Expand Down Expand Up @@ -3519,10 +3522,9 @@ place, and instead produce new objects.
.. method:: bytes.isspace()
bytearray.isspace()

Return ``True`` if all bytes in the sequence are ASCII whitespace and the
sequence is not empty, ``False`` otherwise. ASCII whitespace characters are
those byte values in the sequence ``b' \t\n\r\x0b\f'`` (space, tab, newline,
carriage return, vertical tab, form feed).
Return ``True`` if all bytes in the sequence are
:term:`ASCII whitespace <whitespace>` and the sequence is not empty,
``False`` otherwise.


.. method:: bytes.istitle()
Expand Down
Loading