-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscahelp.html
557 lines (437 loc) · 18.4 KB
/
scahelp.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
<html>
<head>
<title>SCA² - sound change applier - help</title>
<link rel="stylesheet" href="sca2.css"/>
</head>
<body bgcolor="#ffffff" text="#000000">
<table width="100%">
<tr>
<td bgcolor="#eec25a">
<h2>
<br>
<a href="kit.html"><img src="kit-gears.gif" border=0 align="absmiddle" height="53" width="60"></a> SCA<sup>2</sup> Help
</h2>
</td>
</tr>
</table>
<p>
The
<a href="sca2.html">Sound Change Applier 2</a>
is an updated version of
<a href="sounds.htm">my C program</a>
which applies a set of sound changes to a lexicon.
You can use it to help work out a <b>reconstruction</b> for actual
languages, to create plausible descendants of a <b>conlang</b>,
or in fact to make any structured set of lexical changes to a
database of words.
</p>
<p>
This version is written in <b>Javascript</b>, which means it runs in
your browser. The advantage is that it supports Unicode,
it'll run on all systems, and you don't have to mess with ASCII or
command lines anymore.
</p>
<p><a href="#new">Changes since the old SCA</a>.</p>
<h3>Example</h3>
Try it out! With the default inputs, hit <b>Apply</b>.
You should get an output like this:
<blockquote>
<pre>
<tt>leitor</tt>
<tt>doutor</tt>
<tt>fogo</tt>
<tt>jogo</tt>
<tt>distrito</tt>
<tt>cidade</tt>
<tt>adotar</tt>
<tt>obra</tt>
<tt>segundo</tt>
</pre>
</blockquote>
As if by magic, a selection of Latin words has turned into Portuguese.
<h3>The controls</h3>
Here's what the controls do.
<p>
<b>Output format</b>
tells how you want each line of the output to look like.
The first option just prints each output word; this is good for
generating a new list of words (e.g. as input for the next round of
changes). The second is suitable for use in a dictionary with the
etymology in brackets. The third gives the input and output words in
order. (<a href="#gloss">See here</a> for how to add glosses.)
<p>
<b>Show differences from last run</b>,
if checked, will
<b>boldface</b>
any changes from the last run when you hit Apply.
This can be very useful to see what the effect of a changed rule is.
(Try it with the defaults: change <tt>[sm]//_#</tt>
in the first sound change to <tt>[m]//_#</tt> and hit Apply.
You should see several of the words change, now retaining their final s.)
<p>
The comparison is very simple-minded; in particular it can't keep track
of added or deleted lines in the lexicon.
<p>
Note that if you hit Apply without making any changes, all the bolding
is removed (since in fact nothing changed between runs).
<p>
<b>Report which rules apply</b>
prints a report in the Output section listing every time a rule applies,
like this:
<blockquote><tt>u/o/_# applies to districtu at 8</tt></blockquote>
This is useful for understanding why a rule applies (or doesn't) when
you expected the opposite.
<p>
<b>Rewrite on output</b>
controls whether the <a href="#rr">rewrite rules</a> should be reversed
when writing the output lexicon.
<p>
<b>Apply</b>
applies the sound changes to the input lexicon, generating the output
lexicon. We'll talk about exactly what that means below.
<p>
Javascript, to protect your computer, cannot read or write
<b>files</b>. Instead:
<ul>
<li>To <b>save a file</b>, hit <b>Back to .sc</b>, copy the text in
the Sound Changes box, and save it yourself in Notepad or whatever.
<li>To <b>read in a file</b>, grab the text from Notepad and paste it
into the Sound Changes box. Then hit <b>Parse .sc</b> to move the
lines into the appropriate boxes. (Comments, meaning anything the
program can't recognize, stay in the Sound Changes box.)
</ul>
<p>
<b>Help me!</b> brings up this help file.
<p>
<b>IPA</b> will post a set of IPA and other useful Unicode characters
to the Output area. You can then copy and paste a character into any
of the input boxes.
<p>On Safari and Firefox, <b>Undo</b> will work as it should: you can make
a change, hit Apply, and if you don't like the results, click on the text
box you changed and select Undo. This doesn't work on IE.
<h3>Defining sound changes</h3>
The Sound Changes box are rules for modifying the input lexicon.
Hopefully the format of the rules will be familiar to any linguist.
For instance, here's one sound change:
<blockquote><tt>c/g/V_V</tt></blockquote>
This rule says to change <tt>c</tt> to <tt>g</tt> between vowels.
(We'll see how to generalize this rule below.)
<p>More generally, a sound change looks like this:
<blockquote>
<font color="#C08700"><i><b>target/replacement/environment</b></i></font>
</blockquote>
that is, the target string is changed to the replacement string within
the given environment.
<p>Optionally you can use → in place of the first slash. So the
above rule can also be written
<blockquote><tt>c→g/V_V</tt></blockquote>
<p>The environment must always contain an underline <tt>_</tt>,
representing the part that changes. That can be all there is, as in
<blockquote><tt>gn/nh/_ </tt><br></blockquote>
which tells the program to replace <tt>gn</tt> with <tt>nh</tt>
unconditionally.
<p>The character <tt>#</tt> represents the <b>beginning or end</b>
of the word. So
<blockquote><tt>u/o/_#</tt></blockquote>
means to replace <tt>u</tt> with <tt>o</tt>, but only at the end of
the word.
<p>The replacement string can be <b>blank</b>, as in
<blockquote><tt>s//_#</tt></blockquote>
This means that <tt>s</tt> is <b>deleted</b> when it ends a word.
<h4>Categories</h4>
<p>
The environment can contain <b>variables</b>, like <tt>V</tt> above.
These are defined in the Categories box. I use capital letters for this,
though this is not a requirement. Variables can only be one character
long (unless you use <a href="#rr">rewrite rules</a>).
You can define any variables needed to state your sound changes.
E.g. you could define <tt>S</tt> to be any stop, or <tt>K</tt>
for any coronal, or whatever.
<p>So the category definition and rule
<blockquote><tt>F=ie<br>c/i/F_t</tt></blockquote>
means that <tt>c</tt> changes to <tt>i</tt> after a front vowel and before
a <tt>t</tt>.
<p>
You can use variables in the first two parts as well. For instance,
suppose you've defined
<blockquote><tt>S=ptc<br>Z=bdg<br>S/Z/V_V</tt></blockquote>
This means that the stops <tt>ptc</tt> change to their voiced equivalents
<tt>bdg</tt> between vowels. In this usage, the variables must
correspond one for one— <tt>p</tt> goes to <tt>b</tt>,
<tt>t</tt> goes to <tt>d</tt>, etc. Each character in the replacement
variable (here <tt>Z</tt>) gives the transformed value of each character
in the input variable (here <tt>S</tt>).
If the replacement category is shorter than the target category,
the matching input will be deleted.
</p>
<p>
A variable can also be set to a fixed value, or deleted. E.g.
<blockquote><tt>Z//V_V</tt></blockquote>
says to delete voiced stops between vowels.
</p>
<h4>Rule order</h4>
<p>
Rules apply in the <b>order</b> they're listed.
So, with the word <tt>opera</tt> and the rules
<blockquote><tt>p/b/V_V<br>e//C_rV</tt></blockquote>
the first rule voices the <tt>p</tt>, resulting in <tt>obera</tt>;
the second deletes an <tt>e</tt> between a consonant and an
intervocalic <tt>r</tt>, resulting in <tt>obra</tt>.
</p>
<h4>Optional elements in the environment</h4>
<p>
One or more elements in the environment can be marked as
<b>optional</b> with parentheses. E.g.
<blockquote><tt>u/ü/_C(C)F</tt></blockquote>
says to change <tt>u</tt> to <tt>ü</tt>
when it's followed by one or two consonants and then a front vowel.
</p>
<h3><a name="new">New stuff</a></h3>
In addition to Unicode support, the IPA chart, and
<a href="#rr">rewrite rules</a>:
<p>
SCA² treats <b>spaces as word boundaries</b>.
So if you have a rule
<blockquote><tt>k/s/#_</tt></blockquote>
then it will not only turn <tt>kima</tt> to <tt>sima</tt>,
but <tt>kima kimaka</tt> to <tt>sima simaka</tt>.
<p>
<b>Epenthesis</b> is supported by leaving the target part of the rule
blank. The replacement string must be nonblank, and the environment
must contain at least one symbol besides <tt>_</tt>. For instance
<blockquote><tt>/j/_kt</tt></blockquote>
will insert <tt>j</tt> before every instance of <tt>kt</tt>.
<p>
Simple <b>metathesis</b> is supported by the special replacement string
<tt>\\</tt>. For instance
<blockquote><tt>nt/\\/_V</tt></blockquote>
will turn all instances of <tt>nt</tt> before a vowel to <tt>tn</tt>.
(To be precise, the input string is reversed; it can be of any length.)
<p>
<b>Nonce categories</b> can be defined either in the target
(first part of the rule) or environment (last part), by enclosing
the alternatives within brackets. Examples:
<blockquote>
<table>
<tr>
<td><tt>k/s/_[ie]</tt></td>
<td>
Change <tt>k</tt> to <tt>s</tt> before either
<tt>i</tt> or <tt>e</tt>.
</td>
</tr><tr>
<td><tt>[ao]u/o/_</tt></td>
<td>
Either <tt>au</tt> or <tt>ou</tt> is changed to <tt>o</tt>.
</td>
</tr>
<tr>
<td><tt>m/n/_[dt#]</tt></td>
<td>
Change <tt>m</tt> to <tt>n</tt> before dentals and word-finally.
</td>
</tr>
</table>
</blockquote>
With the SCA1 I found myself writing a lot of similar rules,
and nonce categories let them be combined.
<p>
Nonce categories in the environment (only) can include other categories:
<blockquote>
<table>
<tr>
<td><tt>k/g/_[VL]</tt></td>
<td>
Change <tt>k</tt> to <tt>g</tt> before any member
of categories <tt>V</tt> or <tt>L</tt>.
</td>
</tr>
</table>
</blockquote>
<p>
Nonce categories in the environment can include the word boundary
<tt>#</tt>.
<p>
<b>Degemination</b>
can be accomplished using the special character ².
(Note that this is the first character shown in the IPA display.)
<blockquote>
<table>
<tr>
<td><tt>m//_²</tt></td>
<td width="20px"> </td>
<td>Change <tt>mm</tt> to <tt>m</tt>.</td>
</tr>
<tr>
<td><tt>M=mn<br>M//_²</tt></td>
<td width="20px"> </td>
<td>Change <tt>mm</tt> to <tt>m</tt> and <tt>nn</tt> to <tt>n</tt>,
but leave <tt>mn</tt> and <tt>nm</tt> alone.</td>
</tr>
</table>
</blockquote>
Finally, SCA² now supports <b>extended category substitution</b>.
The target must still begin with a category; however, other material may
occur after it. And the replacement string may contain any number of
characters, with a category string given at any point. Examples:
<blockquote>
<table>
<tr>
<td><tt>Bi/Dj/_</tt></td>
<td width="20px"> </td>
<td>
Instances of <tt>B</tt> plus <tt>i</tt> are changed to the
corresponding member of <tt>D</tt> plus <tt>j</tt>.
</td>
</tr>
<tr>
<td><tt>Nd/bM/_V</tt></td>
<td width="20px"> </td>
<td>
Instances of <tt>N</tt> plus <tt>d</tt> before a vowel are
changed to <tt>b</tt> plus the corresponding member of
<tt>M</tt>; note that this is a more complicated metathesis.
</td>
</tr>
</table>
</blockquote>
</p>
<p>
You can do <b>gemination</b> on category substitution, like this:
<blockquote><tt>M/M²/_</tt></blockquote>
This will geminate all members of category <tt>M</tt>.
</p>
<p>
You can use a special <b>wildcard</b> <tt>…</tt> to match
anything. This allows you to test for something earlier or later on in
the word. E.g. this rule will change a member of <tt>S</tt>
to <tt>Z</tt> if there is a vowel <tt>V</tt> <i>anywhere</i> following
it:
<blockquote><tt>S/Z/_…V</tt></blockquote>
<p>(This is a new feature and may still have bugs.)</p>
<p>
The <tt>…</tt> symbol is the third character in the IPA list.
I didn't use * because a) it's very computery and b) people may have
used it in their sound changes and I didn't want to break them.
</p>
<h3><a name="gloss">Including a gloss</a></h3>
It can be convenient to include a gloss in your lexicon which isn't
affected by the sound changes. This is done by separating the gloss with
a space plus the special character ‣ (this is the second character
in the text shown by the IPA button). For instance:
<blockquote><tt>focus ‣ fire</tt></blockquote>
Here's the output you'll get from that (with the default sound changes),
in each of the output formats:
<blockquote>
<tt>
fogo ‣ fire<br>
fire → fogo ‣ fire<br>
fogo ‣ fire [focus]
</tt>
</blockquote>
No sound changes will apply to anything after ‣, but rewrite rules
do apply, so if you use this option I recommend using non-English
characters for the rewrite rules (e.g. use <tt>χ</tt> rather than
<tt>x</tt> for <tt>kh</tt>).
<h3><a name="except">Rule exceptions</a></h3>
Sometimes you'd like to say that a rule applies in environment
e<sub>1</sub>, except for environment e<sub>2</sub>.
You can generally handle this by writing more rules, but SCA²
also allows you to state this directly by adding e<sub>2</sub>
after another slash, e.g.
<blockquote>
<table>
<tr>
<td><tt>k/s/_F/#s_</tt></td>
<td width="20px"> </td>
<td>
<tt>k</tt> changes to <tt>s</tt> before a front vowel,
but <b>not</b> after word-initial <tt>s</tt>.
</td>
</tr>
<tr>
<td><tt>M/N/#_/_CF</tt></td>
<td width="20px"> </td>
<td>
Category <tt>M</tt> changes to category <tt>N</tt> word initially,
but <b>not</b> before another consonant followed by a front vowel.
</td>
</tr>
</table>
</blockquote>
Because of the difficulty of lining up the <tt>_</tt> in both environments,
the exception environment can't include optional characters (those in
parentheses) before the underline. (They can occur after it.)
<h3><a name="rr">Rewrite rules</a></h3>
These allow you to apply global substitutions to the input and output.
The most important use is to allow <b>digraphs</b>.
<blockquote>
<i><font color="red">If you use digraphs, you must follow the rules
in this section. SCA² won't handle digraphs properly on
its own.</font></i>
</blockquote>
Rules with diagraphs will work so long as they can be treated as
sequences of characters. For instance, these all work fine:
<blockquote>
<tt>c/ch/_a</tt>
<br>
<tt>sh/zh/V_V</tt>
<br>
<tt>u/o/_ng</tt>
</blockquote>
But you can't define categories with digraphs. E.g. this was probably
intended to define three fricatives <tt>kh sh zh</tt>
<blockquote><tt>F=khshzh</tt></blockquote>
but in fact it defines the F category as <tt>k h s h z h</tt>,
which won't at all do what you expect.
<p>
The old SCA required that you use single characters instead. E.g. you
might write
<blockquote><tt>F=xßΩ</tt></blockquote>
That still works, but you can use rewrite rules instead.
E.g. define some rules like this:
<blockquote>
<tt>kh|x</tt>
<br>
<tt>zh|ž</tt>
<br>
<tt>sh|š</tt>
<br>
<tt>ng|ŋ</tt>
</blockquote>
Now you can use <tt>kh zh sh ng</tt> in any of the other input
boxes— categories, sound changes, input lexicon. The SCA will
apply the rewrite rules to provide single characters it can work with,
and then apply them again backwards to provide output using digraphs.
<p>
You could also use rewrite rules to allow longer or mnemonic names for
your categories. E.g.
<blockquote>
<tt><front>|F</tt>
</blockquote>
Now you could write sound changes like
<blockquote>
<tt>i/ü/_<front></tt>
</blockquote>
(The category names still have to be unique— you can't use
<tt>F</tt> to define both front vowels and fricatives. But recall
that you can use any Unicode character now for category names.)
<p>
A warning though: so they operate quickly, the rewrite rules are global
and non-contextual. The results may surprise you if you didn't realize
your transcription system was ambigious. E.g. don't use <tt>kh</tt>
both for IPA /x/ and for the cluster /k h/.
<p>
If you need contextual rewrite rules... just use SCA²!
Add your rewrite rules at the top and bottom of the file, with the
appropriate context specifications.
<p>
Sometimes you want the rewrite rules to apply only to the input.
(For instance, the orthography may only apply to the parent language.)
In that case, make sure <b>Rewrite on output</b> is unchecked.
<hr>
<center>
<a href="default.html"><img src="homeg.gif" border=0 alt="Home"></a>
</center>
</body>
</html>