-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstring.tex
440 lines (376 loc) · 19.6 KB
/
string.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
%!TEX root = reference.tex
\chapter{Strings}
\label{strings}
A string is a sequence of Unicode characters that denotes a fragment of text. This chapter focuses on the built-in functions that are based on the \q{string} type.
\section{The Structured String \q{pP} Type}
\label{ppType}
\index{pP type}
\index{type!pP@\q{pP}}
The \q{pP} type -- as defined in Program~\vref{ppTypeProg} -- denotes a `structured \q{string}' value where the structure may be used to represent lines, sub sequences and so on.
\begin{aside}
A primary purpose of the \q{pP} type is to permit simple formatting policies to be applied after the generation of the displayed form of a value.
\end{aside}
\begin{program}
\begin{lstlisting}
type pP is
ppStr(string)
or ppSequence(integer,cons of pP)
or ppNl
or ppSpace;
\end{lstlisting}
\caption{The Structured String \q{pP} type}\label{ppTypeProg}
\end{program}
The intended semantics of the constructors are:
\begin{description}
\item[\q{ppStr}] A literal string. Whenever a literal string is to be generated, the \q{ppStr} constructor is used to `hold' that string. For example, if the display of a value calls for an opening parenthesis, then the term:
\begin{lstlisting}
ppStr("(")
\end{lstlisting}
may be used to denote that.
\item[\q{ppSequence}]
The \q{ppSequence} constructor signals a subsequence in the display. It has two arguments: the first is an indentation amount, and the second is a \q{cons} list of sub-elements.
The indentation is used if a newline is generated within the subsequence. In that case, the new lines will be indented by the amount requested.
\item[\q{ppNl}]
Signal a new line in the displayed sequence.
\begin{aside}
Simply signaling a new line does actually imply that a new line will be generated. New lines are generated depending on whether the client of the pretty print requires one in the actual displayed output.
\end{aside}
\item[\q{ppSpace}]
The \q{ppSpace} symbol denotes a `line-breakable' space. Multiple \q{ppSpace}s in sequence are equivalent to a single one.
\end{description}
\section{The \q{pPrint} contract}
\label{pPrintContract}
\index{pPrint contract@\q{pPrint} contract}
\index{pretty print contract}
The standard contract \q{pPrint}, shown in Program~\vref{ppContractProg} together with the \q{pP} type shown in Program~\vref{ppTypeProg}, is at the core of the standard method for displaying arbitrary values.
\begin{program}
\begin{lstlisting}
contract pPrint over t is {
ppDisp has type (t)=>pP
};
\end{lstlisting}
\caption{The Standard \q{pPrint} Contract}\label{ppContractProg}
\end{program}
The \Sr compiler will automatically generate implementations of the \q{pPrint} contract for all user-defined types. However, it will not override any implementations defined by the user.
\begin{aside}
It is not guaranteed that \emph{all} user-introduced types will be detected. In particular, some anonymous types are implicitly introduced by the programmer and these are not guaranteed to be detected.
However, if the compiler cannot find an implementation of \q{pPrint} then a default implementation will be used.
\end{aside}
The purpose of the \q{pPrint} contract is to support the standard \q{display} function -- see Section~\vref{displayFunction}. This, in turn, is used whenever a string \ntRef{Interpolation} expression is used.
\begin{aside}
One of the primary benefits of allowing programmers to define their own implementation of \q{pPrint} is to enable higher quality display of values. By defining \q{pPrint} for yourself, you can use application oriented display of your values.
\end{aside}
\subsection{Implementing the \q{pPrint} Contract}
As noted above, the \q{pPrint} contract is automatically implemented for standard types and for user-introduced types. However, it is quite possible to define one's own implementation. For example, supposing that values of the \q{tree} type:
\begin{lstlisting}
type tree of t is empty or node(tree of t,t,tree of t)
\end{lstlisting}
were intended to be display:
\begin{lstlisting}
{ "alpha" "beta" "gamma" }
\end{lstlisting}
instead of the default form:
\begin{lstlisting}
node(node(empty,"alpha",empty),"beta",node(empty,"gamma",empty))
\end{lstlisting}
then the following implementation of \q{pPrint} would ensure that such trees were displayed more conveniently:
\begin{lstlisting}
implementation for all t such that
pPrint over tree of t where pPrint over t is {
fun ppDisp(T) is ppSequence(2,cons of [ppStr("{"),
treeDisplay(T),
ppStr("}")])
} using {
fun treeDisplay(empty) is ppSpace
| treeDisplay(node(L,Lb,R)) is
ppSequence(0,cons of [treeDisplay(L),
ppDisp(Lb),
treeDisplay(R)]);
}
\end{lstlisting}
\begin{aside}
Note how the use of \q{ppDisp} within the definition of \q{treeDisplay} will ensure that the display of tree labels may also be overridden with user-defined implementations of \q{pPrint}.
\end{aside}
\subsection{\q{display} -- display a value as a string}
\label{displayFunction}
\begin{lstlisting}
display has type for all s such that
(s)=>string where pPrint over s
\end{lstlisting}
The \q{display} function returns a \q{string} representation of its value.
The \q{display} function is defined in terms of the \q{pPrint} contract defined in Program~\vref{ppContractProg}.
\begin{aside}
Although the system attempts to format the result in a way that can be parsed back; this is not guaranteed. In particular, this is not possible for any values that represent programs -- such as functions and procedures. Furthermore, user-defined implementations of \q{pPrint} may result in non-parseable output.
\end{aside}
\section{The \q{formatting} Contract}
\label{formatContract}
The \q{formatting} contract specifies the single \q{\_format} function which is intended to represent how values should be formatted.
The \q{formatting} contract itself is defined in Program~\vref{formatContractProg}. The result of a call to \q{\_format} is a structured \q{string}.
\begin{program}
\begin{lstlisting}
contract formatting over t is {
_format has type (t,string)=>pP;
}
\end{lstlisting}
\caption{The \q{formatting} Contract\label{formatContractProg}}
\end{program}
\begin{aside}
Normally, like \q{display}, calls to \q{\_format} are represented implicitly in string \ntRef{Interpolation} expressions.
\end{aside}
\subsection{Formatting Codes}
\label{formattingCodes}
A formatting code is a description of how a numeric or \q{string} valued expression should be displayed. Formatting codes allow more detailed control of the representation of the format in terms of minimum and maximum widths of output, the number of decimal places to show and the style of representing numbers -- including how negative numbers are displayed and the display of currencies.
A formatting code is introduced with a \q{:} character immediately after the \q{\$} form and is terminated by a \q{;} character. An invalid formatting code is ignored, and treated as though it were part of the quoted string proper.
Each type of value to be formatted may have different formatting codes; reflecting the natural variations in the type. For example formatting integral values may involve ways of managing the display of the sign of the number and formatting \q{date} values involves ways of show dates and times.
For example, to show a dollar value -- represented as pennies -- in \emph{accounting style} we can use:
\begin{lstlisting}
"Balance: $Amnt:P999900.00P; remaining"
\end{lstlisting}
This format spec displays at least the four least significant digits of the variable \q{Amnt}. If the value of that variable is greater than 9999 then the leading digits are displayed also -- up to a maximum of eight digits. If the value of \q{Amnt} is negative then the number is displayed enclosed in parentheses.
For example, if \q{Amnt} had value -100000, then the value of the expression would be:
\begin{lstlisting}
Balance: (1000.00) remaining
\end{lstlisting}
If \q{Amnt} were 10000:
\begin{lstlisting}
Balance: 1000.00 remaining
\end{lstlisting}
\begin{aside}
Note the additional spaces: if the \q{P} mode is used for representing sign, a white space character is generated for positive numbers. This facilitate straightforward alignment of columnar reports.
\end{aside}
If \q{Amnt} had value 45, then the result would be:
\begin{lstlisting}
Balance: 00.45 remaining
\end{lstlisting}
The \q{'0'} in the format will result in leading zeros being printed.
\begin{aside}
If a value cannot be represented in the delimited number of characters then the string:
\begin{lstlisting}
*Error*
\end{lstlisting}
is displayed; at least, as much of \q{*Error*} as is possible in the allocated space.\end{aside}
\subsection{\q{format} -- format a string for display}
\label{formatStringFunction}
\begin{lstlisting}
format has type (string,string)=>string
\end{lstlisting}
\begin{aside}
The \q{format} function for \q{string} values is normally invoked implicitly within a \q{string} \ntRef{Interpolation} expression. For example,
\begin{lstlisting}
"--$Msg:C13;--"
\end{lstlisting}
is equivalent to the expression:
\begin{lstlisting}
"--"++format(Msg,"C13")++"--"
\end{lstlisting}
and has value:
\begin{lstlisting}
"-- freddie --"
\end{lstlisting}
assuming that the value of the \q{Msg} variable is \q{"freddie"}.
\end{aside}
The format specification for \q{string} values is given in the regular expression:
\begin{lstlisting}
`[LCR][0-9]+`
\end{lstlisting}
where each control code is defined:
\begin{description}
\item[\q{L}]
The value is shown left-aligned in the text.
The decimal value immediately after the \q{L} character is the size of the field.
If the displayed length of the number or string is less than that permitted; then the value is shown left-aligned. If the length of the value is greater than the size of the field then the text is truncated -- i.e., the first N characters of the value are used.
\item[\q{R}] The value is shown right-aligned in the text -- if the length of the value is less than the size of the field.
If the length of the value is greater than the size of the field then the text is truncated.
\item[\q{C}] The value is shown centered in the field.
\end{description}
\begin{aside}
The \q{format} function is defined in terms of the \q{\_format} function and the \q{formatting} contract -- see Program~\vref{formatContractProg}.
\end{aside}
\section{Standard String Functions}
\label{simpleString}
In addition to certain specific string functions -- such as string concatenation -- the \q{string} type implements the \q{comparable} contract which enables \q{string} values to be compared. The \q{indexable} contract -- see Program~\vref{sizeableContract} -- is also implemented for \q{string}s, which means that the normal \q{[]} notation may be used to access the characters of a string.
\subsection{\q{isEmpty} -- test for empty string}
\label{stringEmptyFunction}
\q{isEmpty} is part of the standard \q{sizeable} contract (see Program~\vref{sizeableContract}):
\begin{lstlisting}
isEmpty has type (string)=>boolean
\end{lstlisting}
The \q{isEmpty} function returns true if its argument is the empty string. It's definition is equivalent to:
\begin{lstlisting}
fun isEmpty(X) is X="";
\end{lstlisting}
\subsection{\q{size} -- size of the string}
\label{stringSizeFunction}
\q{size} is part of the standard \q{sizeable} contract (see Program~\vref{sizeableContract}):
\begin{lstlisting}
size has type (string)=>integer
\end{lstlisting}
The \q{size} function returns the number of Unicode characters in the \q{string}. Note that this is not generally the same as the number of bytes in the string.
\subsection{\q{flattenPP} -- Flatten a Structured String}
\index{flattenPP@\q{flattenPP}}
\begin{lstlisting}
flattenPP has type (pP)=>string;
\end{lstlisting}
The \q{flattenPP} function takes a structured string and `flattens it' into a regular \q{string}.
\begin{aside}
This function is used by the standard functions \q{display} and \q{format} to convert the result of displaying or formatting a value into a \q{string}.
\end{aside}
\subsection{\q{<} -- less than}
\index{<@\q{<} predicate}
\begin{lstlisting}
(<) has type (string,string)=>boolean
\end{lstlisting}
\q{(<)} is part of the standard \q{comparable} contract -- see Program~\vref{comparableContract}.
String comparison is based on a lexicographic comparison: one \q{string} is less than another if its first character is less than the first character of the second -- irrespective of the actual lengths of the strings. Thus
\begin{lstlisting}
Abbbbbbb < B
\end{lstlisting}
because \q{A} is less than \q{B}. Characters are compared based on their \emph{code point} within the Unicode encoding.\footnote{This is the same concept of string ordering as that within Java\tm.}
\subsection{\q{=<} -- less than or equal}
\index{=<@\q{=<} predicate}
\begin{lstlisting}
(=<) has type (string,string)=>boolean
\end{lstlisting}
\q{(=<)} is part of the standard \q{comparable} contract -- see Program~\vref{comparableContract}.
The \q{=<} predicate for \q{string} values is satisfied if the left argument is less than or equals to the right argument under the lexicographic ordering.
\subsection{\q{>} -- greater than}
\index{>@\q{>} predicate}
\begin{lstlisting}
(>) has type (string,string)=>boolean
\end{lstlisting}
\q{(>)} is part of the standard \q{comparable} contract -- see Program~\vref{comparableContract}.
The \q{>} predicate is satisfied if the left argument is lexicographically greater than the right argument.
\subsection{\q{>=} -- greater then or equal}
\index{>=@\q{>=} predicate}
\begin{lstlisting}
(>=) has type (string,string)=>boolean
\end{lstlisting}
\q{(>=)} is part of the standard \q{comparable} contract -- see Program~\vref{comparableContract}.
The \q{>=} predicate is satisfied if the left argument is lexicographically greater than or equal to the right argument.
\subsection{\q{\_index} -- Index Character from String}
\label{indexString}
\q{\_index} is part of the standard \q{indexable} contract -- see Program~\vref{indexableContractDef}.
\begin{lstlisting}
_index has type (string,integer)=>option of integer
\end{lstlisting}
The \q{\_index} function returns a character from a \q{string} value. Note that the character is returned as an \q{integer} value -- representing the unicode code point representation of the character.
There is special syntax for indexing characters from a \q{string} -- as with indexing other kinds of \q{indexable} types -- one can use:
\begin{lstlisting}
S[ix]
\end{lstlisting}
instead of
\begin{lstlisting}
_index(S,ix)
\end{lstlisting}
\subsection{\q{\_slice} -- Substring}
\label{sliceString}
\q{\_slice} is part of the \q{sliceable} contract -- see Program~\vref{sliceableContractProg}.
\begin{lstlisting}
_slice(string,integer,integer)=>string
\end{lstlisting}
The \q{\_slice} function extracts a substring from its first argument. The first character of the extracted substring is identified by the second argument; and the end point of the substring is identified by the third argument. An expression of the form:
\begin{lstlisting}
_slice("this is a string",5,7)
\end{lstlisting}
returns the substring \q{"is"} -- corresponding to the two characters located at positions 5 and 6 in the source string.
There is a special notation for this functionality: the slice notation (see Section~\vref{sliceFunction}. For example, if the variable \q{S} is bound to the string \q{"this is a string"}, then the above expression may be written:
\begin{lstlisting}
S[5:7]
\end{lstlisting}
\subsection{\q{\_splice} -- Replace Substring}
\label{spliceString}
\q{\_splice} is part of the \q{sliceable} contract -- see Program~\vref{sliceableContractProg}.
\begin{lstlisting}
_splice has type (string,integer,integer,string) => string
\end{lstlisting}
The \q{\_splice} function replaces a substring within its first argument. For example, the expression:
\begin{lstlisting}
_splice("this is a string",5,7,"was")
\end{lstlisting}
has, as its value:
\begin{lstlisting}
"this was a string"
\end{lstlisting}
Like the \q{\_slice} notation, there is special syntax for this function -- when used as an action. The action:
\begin{lstlisting}
S[ix:tx] := U
\end{lstlisting}
is equivalent to the assignment:
\begin{lstlisting}
S := _splice(S,ix,cx,U)
\end{lstlisting}
\subsection{\q{++} -- string concatenation}
\label{stringConcatFunction}
\q{++} is the standard string concatenation function. It is part of the \q{concatenate} contract (see Program~\vref{concatenateContractDef})
\begin{lstlisting}
(++) has type (string,string)=>string;
\end{lstlisting}
\index{string!interpolation}
Use of the \q{++} function over strings is implied by the \emph{string interpolation expression} (see Section~\vref{StringInterpolation}). For example, the string expression:
\begin{lstlisting}
"Count = $count, Sum=$sum"
\end{lstlisting}
is shorthand for
\begin{lstlisting}
"Count ="++display(count)++", Sum="++display(sum)
\end{lstlisting}
\subsection{\q{explode} -- Explode a string to code points}
\label{stringExplodeFunction}
The \q{explode} function is part of the \q{explosion} contract.
\begin{lstlisting}
explode has type (string)=>cons of integer;
\end{lstlisting}
\begin{aside}
This version of the \q{explode} function is useful when performing complex operations over \q{string} values. For example, it can be more efficient to first of all \q{explode} a \q{string} before tokenizing the string.
\end{aside}
\subsection{\q{implode} -- Implode a cons list of code points to a string}
\label{stringImplodeFunction}
The \q{implode} function is part of the \q{explosion} contract.
\begin{lstlisting}
implode has type (cons of integer)=>string;
\end{lstlisting}
The \q{implode} function takes a \q{cons} list of \q{integer} code points and constructs a \q{string} value from it.
\subsection{\q{reverse} -- Reverse the characters in a string}
\label{stringReverseFunction}
The \q{reverse} function is part of the \q{reversible} contract -- see Program~\vref{reversibleContractDef}.
\begin{lstlisting}
reverse has type (string)=>string
\end{lstlisting}
\subsection{\q{findstring} -- string search}
\label{findStringFunction}
\q{findstring} is used to determine the (next) location of a search token within a \q{string}.
\begin{lstlisting}
findstring has type (string,string,integer)=>integer;
\end{lstlisting}
\index{string!search within}
\index{finding substrings}
The \q{findstring} function searches a string for an occurrence of another string. The first argument is the string to search, the second is the search token, and the third is the integer offset where to start the search.
For example, the result of the expression:
\begin{lstlisting}
findstring("the lazy dog jumped over the quick brown fox","the",5)
\end{lstlisting}
is \q{25}.
If the search token is not present then \q{findstring} returns -1;
\subsection{\q{gensym} -- Generate Unique String}
\label{gensym}
\index{gensym standard function@\q{gensym} standard function}
\begin{lstlisting}
gensym has type (string)=>string
\end{lstlisting}
The \q{gensym} function is used to generate unique strings that have an arbitrarily high probability of being unique.
The generated string has a prefix consisting of the single argument, a middle which is a unique string generated based on a globally unique identifier identifying the current process and a counter.
The result is a string that has a high probability of being unique. It is guaranteed to be unique within the current processor.
\subsection{\q{spaces} -- Generate a string of spaces}
\label{spaces}
\index{spaces standard function@\q{spaces} standard function}
\begin{lstlisting}
spaces has type (integer)=>string
\end{lstlisting}
The \q{spaces} function generates a \q{string} containing only the space character -- \q{'\spce'}. For example, the value of
\begin{lstlisting}
spaces(3)
\end{lstlisting}
is the \q{string}
\begin{lstlisting}
" "
\end{lstlisting}