|
1 | 1 | \hsection{Text Strings}% |
2 | 2 | \label{sec:str}% |
3 | 3 | % |
4 | | -The fourth important datatype in \python\ are text strings. |
| 4 | +The fourth and last important basic datatype in \python\ are text strings. |
5 | 5 | Text strings are sequences of characters of an arbitrary length. |
6 | 6 | In \python, they are represented by the datatype \pythonilIdx{str}. |
7 | 7 | Indeed, we have already used it before, even in our very first example program back that simply printed \pythonil{"Hello World"} in \cref{lst:very_first_program} in \cref{sec:ourFirstProgram}. |
|
16 | 16 | As \cref{exec:str_indexing} shows, there are two basic ways to specify a text string literal\pythonIdx{str!literal}: |
17 | 17 | Either enclosed by double quotes, e.g., \pythonil{"Hello World!"}\pythonIdx{\textquotedbl\idxdots\textquotedbl} or enclosed by single quotes, e.g., \pythonil{'Hello World!'}\pythonIdx{\textquotesingle\idxdots\textquotesingle}. |
18 | 18 | The quotation marks are only used to delimit the strings, i.e., to tell \python\ where the string begins or ends. |
19 | | -They are not themselves part of the string. |
20 | | - |
21 | | -\bestPractice{strDoubleQuote}{When defining a string literal, the double-quotation mark variant~(\pythonil{"..."}\pythonIdx{\textquotedbl\idxdots\textquotedbl}) may be preferred over the single-quotation mark variant~(\pythonil{'...'}\pythonIdx{\textquotesingle}); see also~\cref{bp:longstrDoubleQuote}.} |
22 | | - |
| 19 | +They are not themselves part of the string.% |
| 20 | +% |
| 21 | +\bestPractice{strDoubleQuote}{When defining a string literal, the double-quotation mark variant~(\pythonil{"..."}\pythonIdx{\textquotedbl\idxdots\textquotedbl}) may be preferred over the single-quotation mark variant~(\pythonil{'...'}\pythonIdx{\textquotesingle}).~(The \citetitle{PEP8}~\cite{PEP8} does not give a recommendation, but maybe for consistency with the~\citetitle{PEP257}~\cite{PEP257}, see also~\cref{bp:longstrDoubleQuote}.)}% |
| 22 | +% |
23 | 23 | One basic operation is string concatenation\pythonIdx{str!concatenation}\pythonIdx{str!+}\pythonIdx{+}: |
24 | 24 | \pythonil{"Hello" + ' ' + "World"}\pythonIdx{\textquotedbl\idxdots\textquotedbl}\pythonIdx{\textquotesingle\idxdots\textquotesingle} concatenates the three strings \pythonil{"Hello"}, \pythonil{" "}, and \pythonil{"World"}. |
25 | 25 | The result is \pythonil{"Hello World"}\pythonIdx{\textquotedbl\idxdots\textquotedbl}. |
|
68 | 68 | Finally, we can also omit the start index, in which case everything until right before the end index is returned. |
69 | 69 | Therefore, \pythonil{"Hello"[:-2]} will return everything from the beginning of the string until right before the second-to-last character. |
70 | 70 | This gives us \pythonil{"Hel"}. |
| 71 | +The slice~\pythonil{[1:8:2]} returns the substring starting at index~1 and ending before index~8, containing every second character. |
| 72 | +Applied to~\pythonil{"Hello World!"} it therefore yields~\pythonil{"el o"}. |
71 | 73 | We will discussing slicing again later when discussing lists in~\cref{sec:lists}. |
72 | 74 |
|
73 | 75 | \gitEvalPython{str_basic_ops}{}{simple_datatypes/str_basic_ops.py}% |
|
87 | 89 | It returns \pythonil{6}, because the \inQuotes{W} of \inQuotes{World} is the seventh character in this string and the indices are zero-based. |
88 | 90 | Trying to find the \pythonil{"world"} in \pythonil{"Hello World!"} yields~\pythonil{-1}, however. |
89 | 91 | \pythonil{-1} means that the string cannot be found. |
| 92 | + |
90 | 93 | We learn that string operations are case-sensitive\pythonIdx{str!case-sensitive}: |
91 | | -\pythonil{"World" != "world"} would be \pythonilIdx{True}. |
| 94 | +The uppercase character~\inQuotes{W} is different from the lowercase character~\pythonil{w}. |
| 95 | +Therefore, \pythonil{"World" != "world"} is~\pythonilIdx{True}. |
| 96 | +Therefore, \pythonil{"world"} cannot be found in \pythonil{"Hello World!"}. |
92 | 97 | We also learn that we need to be careful not to use the result of \pythonilIdx{find} as index in a string directly before checking that it is \pythonil{>= 0}! |
93 | 98 | As you have learned, \pythonil{-1} is a perfectly fine index into a string, even though it means that the string we tried to find was not found. |
94 | 99 |
|
|
106 | 111 | If we want to search from the end of the string, we use \pythonilIdx{rfind}. |
107 | 112 | \pythonil{"Hello World!".rfind("l")} gives us~\pythonil{9} directly. |
108 | 113 | If we want to search for the~\inQuotes{l} before that one, we need to supply an inclusive starting and exclusive ending index of the range to be searched. |
109 | | -\pythonil{"Hello World!".rfind("l", 0, 9)} searches for any~\inQuotes{l} from index~8 down to~0 and thus returns~\pythonil{3}. |
| 114 | +\pythonil{"Hello World!".rfind("l", 2, 9)} searches for any~\inQuotes{l} from index~8 down to~2 and thus returns~\pythonil{3}. |
110 | 115 | \pythonil{"Hello World!".rfind("l", 0, 3)} gives us~\pythonil{2} and since there is no~\inQuotes{l} before that, \pythonil{"Hello World!".rfind("l", 0, 2)} yields~\pythonil{-1}. |
111 | 116 | \end{sloppypar}% |
112 | 117 | % |
113 | 118 | \begin{sloppypar}% |
114 | 119 | Another common operation is to replace substrings with something else. |
115 | 120 | \pythonil{"Hello World!".replace("Hello", "Hi")}\pythonIdx{replace} replaces all occurrences of \inQuotes{"Hello"} in \inQuotes{Hello World} with \inQuotes{Hi}. |
116 | | -The result is \pythonil{"Hi World!"} and \pythonil{"Hello Hello World!".replace("Hello", "Hi")} becomes \pythonil{"Hi Hi World!"}. |
| 121 | +The result is \pythonil{"Hi World!"} and \pythonil{"Hello World! Hello!".replace("Hello", "Hi")} becomes \pythonil{"Hi World! Hi!"}. |
| 122 | +It does not replace strings recursively, though. |
| 123 | +If you try to do \pythonil{"Hello World!".replace("Hello", "Hello! Hello!")}, then the \pythonil{"Hello"} is indeed replaced with \pythonil{"Hello! Hello!"}. |
| 124 | +This means that the new string now contains \pythonil{"Hello"} twice. |
| 125 | +These new occurrences are \emph{not} replaced, so the result remains as \pythonil{"Hello! Hello! World!"}.% |
117 | 126 | \end{sloppypar}% |
118 | 127 | % |
119 | 128 | \begin{sloppypar}% |
120 | | -Often, we want to remove all leading or trailing whitespace characters from a string. |
| 129 | +Often, we want to remove all leading or trailing whitespace characters~(spaces, newlines, tabs, \dots) from a string. |
121 | 130 | The \pythonilIdx{strip} function does this for us: |
122 | 131 | \pythonil{" Hello World! ".strip()} returns \pythonil{"Hello World!".strip()}, i.e., the same string, but with the leading and trailing space removed. |
123 | 132 | If we only want to remove the spaces on the left-hand side, we use \pythonilIdx{lstrip} and if we only want to remove those on the right-hand side, we use \pythonilIdx{rstrip} instead. |
|
134 | 143 |
|
135 | 144 | Of course, these were just a small selection of the many string operations available in \python. |
136 | 145 | You can find more in the \href{https://docs.python.org/3/library/stdtypes.html\#textseq}{official documentation}~\cite{PSF:P3D:TPSL:TSTS}.% |
| 146 | +\FloatBarrier% |
137 | 147 | \endhsection% |
138 | 148 | % |
139 | 149 | \hsection{The str Function and f-strings}% |
|
273 | 283 | For example, you could write \pythonil{f"\{23\ *\ sin(2\ -\ 5)\ =\ :.2f\}"} and then the \pythonil{.2f} format would be applied to the result of the expression, i.e., you would get \pythonil{"23 * sin(2 - 5) = -3.25"} as the result of the extrapolation. |
274 | 284 |
|
275 | 285 | You are now able to convert the results of your computations to nice text.% |
| 286 | +\FloatBarrier% |
276 | 287 | \endhsection% |
277 | 288 | % |
278 | 289 | \hsection{Converting Strings to other Datatypes}% |
|
298 | 309 | Finally, the function \pythonilIdx{bool}\pythonIdx{bool!function} converts the strings \pythonil{"True"} and \pythonil{"False"} to \pythonilIdx{True} and \pythonilIdx{False}, respectively. |
299 | 310 | With this, you are also able to convert strings to data that you can use as input for your computations.% |
300 | 311 | % |
| 312 | +\FloatBarrier% |
301 | 313 | \endhsection% |
302 | 314 | % |
303 | 315 | % |
|
359 | 371 | We already learned the sequences \inQuotes{\textbraceleft\textbraceleft}\pythonIdx{\textbraceleft\textbraceleft} and \inQuotes{\textbraceright\textbraceright}\pythonIdx{\textbraceright\textbraceright} that were designed for \pglspl{fstring} only. |
360 | 372 | The backslash-based escape sequence we discussed in this section work for both \pglspl{fstring} and normal strings.% |
361 | 373 | \pythonIdx{str!escaping}\pythonIdx{escaping}% |
| 374 | +\FloatBarrier% |
362 | 375 | \endhsection% |
363 | 376 | % |
364 | 377 | \hsection{Multi-Line Strings}% |
|
372 | 385 | Such string delimiters are used for multi-line strings. |
373 | 386 | In such strings, you can insert linebreaks by hitting \keys{\enter} completely normally. |
374 | 387 | You can use the escape sequences from the previous section as well. |
375 | | -The main use case are \pglspl{docstring}, which we will discuss later, see, e.g., \cref{bp:module:docstrings}. |
376 | | - |
377 | | -\bestPractice{longstrDoubleQuote}{When defining a multi-line string literal, the double-quotation mark variant~(\pythonil{"""..."""})\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} is usually preferred over the single-quotation mark variant~(\pythonil{'''...'''}\pythonIdx{\textquotesingle\textquotesingle\textquotesingle})~\cite{PEP257,PEP8}.} |
378 | | - |
| 388 | +The main use case are \pglspl{docstring}, which we will discuss later, see, e.g., \cref{bp:module:docstrings}.% |
| 389 | +% |
| 390 | +\bestPractice{longstrDoubleQuote}{When defining a multi-line string literal, the double-quotation mark variant~(\pythonil{"""..."""})\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} is preferred over the single-quotation mark variant~(\pythonil{'''...'''}\pythonIdx{\textquotesingle\textquotesingle\textquotesingle})~\cite{PEP257,PEP8}.}% |
| 391 | +% |
379 | 392 | \cref{exec:str_multiline} shows what happens if we print such a multi-line string. |
380 | 393 | We first create the string by writing the three lines \textil{This is a multi-line string.}, \textil{I can hit enter to begin a new line.}, and \textil{This linebreak is then part of the string.}. |
381 | 394 | The first line begins with \pythonil{"""}\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} and the last one ends with \pythonil{"""}\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} as well. |
|
384 | 397 | We can also have multi-line \pglspl{fstring}\pythonIdx{str!f}\pythonIdx{f-string!multi-line}. |
385 | 398 | These then simply start with \pythonil{f"""}\pythonIdx{f\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl}. |
386 | 399 | The example in \cref{exec:str_multiline} presents such a multi-line \pgls{fstring} with two expressions for \pgls{strinterpolation} which spans over three lines.% |
| 400 | +\FloatBarrier% |
387 | 401 | \endhsection% |
388 | 402 | % |
389 | 403 | \hsection{Unicode and Character Representation}% |
|
439 | 453 | Anyway, in \cref{exec:str_unicode}, we use the information obtained in \cref{fig:unicodeCharacterTableSubset} to print the Chinese text \inQuotes{你好。} standing for \inQuotes{Hello.} and pronounced as \inQuotes{N{\v{\i}} h{\v{a}}o.} as a unicode-escaped string. |
440 | 454 | We found that the character for \inQuotes{你} has unicode number~4f60, \inQuotes{好} has~597d, and the big period~\inQuotes{。} has~3002. |
441 | 455 | The string \pythonil{"\\u4f60\\u597d\\u3002"} then corresponds to the correct Chinese text~\inQuotes{你好。}.% |
| 456 | +\FloatBarrier% |
442 | 457 | \endhsection% |
443 | 458 | % |
444 | 459 | \hsection{Summary}% |
|
0 commit comments