Skip to content

Commit 41a9547

Browse files
committed
ksy_style_guide.adoc: added encoding-name section detailing guidance on encoding specification
1 parent 5d2925c commit 41a9547

1 file changed

Lines changed: 67 additions & 0 deletions

File tree

ksy_style_guide.adoc

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,73 @@ TODO: documentation contents, what should and should no be included
214214

215215
TODO
216216

217+
[[encoding-name]]
218+
== Encoding name (`encoding`)
219+
220+
The `encoding` key (used in `meta/encoding` for the default encoding and
221+
in individual attributes to override it) specifies the character
222+
encoding for string values. The KS compiler recognizes a set of
223+
canonical encoding names and common aliases, and will emit warnings if a
224+
non-canonical form is used.
225+
226+
One MUST use the canonical (exact) spelling of the encoding name. The
227+
match is *case-sensitive*: for example, `UTF-8` is correct, but `utf-8`
228+
or `Utf-8` will trigger a warning.
229+
230+
The following canonical names are recognized by the compiler:
231+
232+
[cols="1,3", options="header"]
233+
|====
234+
| Canonical name | Common aliases (accepted but produce a warning)
235+
236+
| `ASCII`
237+
| US-ASCII, US_ASCII, IBM367, cp367
238+
239+
| `UTF-8`
240+
| UTF8, UTF_8, cp65001
241+
242+
| `UTF-16BE`
243+
| UTF16BE, UTF16-BE, UTF-16-BE, UTF_16BE
244+
245+
| `UTF-16LE`
246+
| UTF16LE, UTF16-LE, UTF-16-LE, UTF_16LE
247+
248+
| `UTF-32BE`
249+
| UTF32BE, UTF32-BE, UTF-32-BE, UTF_32BE
250+
251+
| `UTF-32LE`
252+
| UTF32LE, UTF32-LE, UTF-32-LE, UTF_32LE
253+
254+
| `ISO-8859-1`
255+
| ISO8859-1, ISO_8859_1, latin1, cp819, windows-28591
256+
257+
| `ISO-8859-2` ... `ISO-8859-16`
258+
| Same pattern of aliases (e.g. latin2, latin3, ..., latin10)
259+
260+
| `windows-1250` ... `windows-1258`
261+
| cp1250 ... cp1258
262+
263+
| `IBM437`
264+
| cp437, 437
265+
266+
| `IBM866`
267+
| cp866, 866
268+
269+
| `Shift_JIS`
270+
| Shift-JIS, ShiftJIS, S-JIS, SJIS, PCK
271+
272+
| `Big5`
273+
| csBig5
274+
275+
| `EUC-KR`
276+
| EUCKR, EUC_KR, korean
277+
|====
278+
279+
The list above is included just for demonstration purposes. The master
280+
list is maintained in the compiler source code (see
281+
https://github.com/kaitai-io/kaitai_struct_compiler/blob/master/shared/src/main/scala/io/kaitai/struct/EncodingList.scala[EncodingList.scala])
282+
— if in doubt, follow the list in the source code.
283+
217284
[[seq-attr]]
218285
== Sequence attributes
219286

0 commit comments

Comments
 (0)