|
423 | 423 | \begin{figure}%
|
424 | 424 | \centering%
|
425 | 425 | \includegraphics[width=0.65\linewidth]{\currentDir/unicodeCharacterTableSubset}%
|
426 |
| -\caption{A subset of the \pgls{unicode} character table including the Basic Lating characters as well as some Simplified Chinese characters.}% |
| 426 | +\caption{A subset of the \pgls{unicode} character table including the Basic Lating characters as well as some Simplified Chinese characters~(简体中文)~\cite{SCR1956ROTSCOPTSCCS1}.}% |
427 | 427 | \label{fig:unicodeCharacterTableSubset}%
|
428 | 428 | \end{figure}%
|
429 | 429 | %
|
430 | 430 | \begin{figure}%
|
431 | 431 | \centering%
|
432 | 432 | \includegraphics[width=0.65\linewidth]{\currentDir/strUnicodeEscape}%
|
433 |
| -\caption{\inQuotes{Hello} in Simplified Chinese and entered via \pgls{unicode} escaped string.}% |
| 433 | +\caption{\inQuotes{Hello}, i.e., \inQuotes{你好。} in Simplified Chinese~(简体中文)~\cite{SCR1956ROTSCOPTSCCS1} and entered via \pgls{unicode} escaped string.}% |
434 | 434 | \label{fig:strUnicodeEscape}%
|
435 | 435 | \end{figure}%
|
436 | 436 | %
|
|
450 | 450 | \pgls{unicode}~\cite{TUC2023U1510,TUC2023U151ACS,ISOIEC106462020ITUCCSU}, the most frequently used mapping of characters to numbers.
|
451 | 451 | Therefore, \python\ uses \pgls{unicode} as well\pythonIdx{str!unicode}.
|
452 | 452 |
|
453 |
| -\cref{fig:unicodeCharacterTableSubset} illustrates a subset of the \pgls{unicode} code table, including the Basic Latin characters, which are basically still compatible with~ASCII, and some Simplified Chinese characters. |
| 453 | +\cref{fig:unicodeCharacterTableSubset} illustrates a subset of the \pgls{unicode} code table, including the Basic Latin characters, which are basically still compatible with~ASCII, and some Simplified Chinese characters~(简体中文)~\cite{SCR1956ROTSCOPTSCCS1}. |
454 | 454 | Most \pgls{unicode} characters can be identified by a number represented as four hexadecimal digits (mentioned back in \cref{sec:int:bitstrings}).
|
455 | 455 | The rows \cref{fig:unicodeCharacterTableSubset} are annotated with the first three of these digits and the columns with fourth and last hexadecimal digits.
|
456 | 456 |
|
|
462 | 462 | If we use the \pythonil{\\u}-based\pythonIdx{\textbackslash{u}} escape, then we can represent \emph{any} character as Basic Latin text sequence.
|
463 | 463 | It is also useful if we want to, e.g., enter Chinese text on a machine that does not have an IME or other corresponding tools, or text in any other kind of language where we do not have corresponding keys on the keyboard (see, e.g., \cref{lst:variables:pi_liu_hui} later on).
|
464 | 464 |
|
465 |
| -Anyway, in \cref{fig:strUnicodeEscape}, we use the information obtained in \cref{fig:unicodeCharacterTableSubset} to print the Chinese text \inQuotes{Ni Hao.} standing for \inQuotes{Hello.} as unicode escaped string. |
466 |
| -We found that the character for \inQuotes{Ni} has unicode number~4f60, \inQuotes{Hao} has~597d, and the big period has~3002. |
467 |
| -The string \pythonil{"\\u4f60\\u597d\\u3002"} then corresponds to the correct Chinese text.% |
| 465 | +Anyway, in \cref{fig:strUnicodeEscape}, we use the information obtained in \cref{fig:unicodeCharacterTableSubset} to print the Chinese text \inQuotes{你好。} standing for \inQuotes{Hello.} and pronounced as \inQuotes{N{\v{i}} h{\v{a}}o.} as a unicode-escaped string. |
| 466 | +We found that the character for \inQuotes{你} has unicode number~4f60, \inQuotes{好} has~597d, and the big period~\inQuotes{。} has~3002. |
| 467 | +The string \pythonil{"\\u4f60\\u597d\\u3002"} then corresponds to the correct Chinese text~\inQuotes{你好。}.% |
468 | 468 | \endhsection%
|
469 | 469 | %
|
470 | 470 | \hsection{Summary}%
|
|
0 commit comments