You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unicode text can be stored in symbol, byte list and character list (string) datatypes.
8
9
10
+
Since the data is simply a sequence of bytes, any Unicode format can be stored.
11
+
However, it is best to use an encoding such as UTF-8 or GBK that extends 7-bit ASCII, i.e. a single byte in the range `00`–`7f` means the same thing in ASCII.
9
12
13
+
The display console should have the matching code page set or you will not be able to view the data correctly.
14
+
For example, if you store in UTF-8 format, ensure that your code page for the display is also UTF-8.
10
15
16
+
## Examples processing Unicode data
11
17
12
-
Unicode text can be stored in symbol, byte list and character list (string) datatypes.
13
-
14
-
Since the data is simply a sequence of bytes, any Unicode format can be stored. However, it is best to use an encoding such as UTF-8 or GBK that extends 7-bit ASCII, i.e. a single byte in the range `00`–`7f` means the same thing in ASCII. kdb+ will load a script with such encoding, but it will not load other formats. Note that if using these encodings, avoid having a byte-order-mark prefix on the data.
18
+
### Storing UTF-8 in a char vector
15
19
16
-
The q language itself uses only 7-bit ASCII. For example, the statement `2+3` should be given as the three decimal bytes 50 43 51, as in:
20
+
The two Chinese characters "香蕉" each use 3 bytes in UTF-8.
21
+
In this example, the two chinese characters are stored in a char vector, which is then shown to using six 1-byte characters (i.e. 2 x 3 bytes).
22
+
[Comparison](../ref/match.md) with the original UTF-8 characters return true.
23
+
Contents are printed in octal format, showing the 6 bytes.
24
+
When printed to stdout via [`-1`](../basics/handles.md#file-stdout-stderr), the UTF-8 representation of the characters are shown.
17
25
18
26
```q
19
-
q)`char$50 43 51
20
-
"2+3"
21
-
q)value `char$50 43 51
22
-
5
27
+
q)t:"香蕉"
28
+
q)type t
29
+
10h
30
+
q)count t
31
+
6
32
+
q)t
33
+
"\351\246\231\350\225\211"
34
+
q)t~"香蕉"
35
+
1b
36
+
q)-1 t;
37
+
香蕉
23
38
```
24
39
25
-
Fixed-width Unicode formats cannot be used, since for example, in UTF-16, `2+3` would be the six decimal bytes 50 0 43 0 51 0, and q does not recognize this:
26
40
27
-
```q
28
-
q)value `char$50 0 43 0 51 0
29
-
'char
30
-
```
31
-
32
-
The display console should have the matching code page set or you will not be able to view the data correctly. e.g. if you store in UTF-8 format, ensure that your code page for the display is also UTF-8.
Writing to stdout with [`-1`](../basics/handles.md#file-stdout-stderr) shows the formatted text:
62
71
63
72
```q
64
73
q)-1 text 0;
65
74
每日一蘋果, 醫生遠離我
66
75
```
67
76
68
-
Example assignments using the C interface:
77
+
### Using external interfaces
78
+
79
+
Sending non-ascii data can be done using the various programming interfaces, such as C or Python.
80
+
81
+
The following example using the [C interface](../interfaces/capiref.md) connects over TCP and sets two variables, each being char vectors representing UTF-8 strings.
69
82
70
83
```c
71
84
intmain(){
@@ -76,3 +89,27 @@ int main(){
76
89
}
77
90
```
78
91
92
+
## Using Unicode scripts or statements
93
+
94
+
kdb+ will load a script with such encoding, but it will not load other formats. Note that if using these encodings, avoid having a byte-order-mark prefix on the data.
95
+
96
+
The q language itself uses only 7-bit ASCII.
97
+
For example, the statement `2+3` should be given as the three decimal bytes 50 43 51, as in:
98
+
99
+
```q
100
+
q)`char$50 43 51
101
+
"2+3"
102
+
```
103
+
Using [`value`](../ref/value.md) to evaluate the statement `2+3` results in 5:
104
+
```q
105
+
q)value `char$50 43 51
106
+
5
107
+
```
108
+
Fixed-width Unicode formats cannot be used, since for example, in UTF-16, `2+3` would be the six decimal bytes 50 0 43 0 51 0, and q does not recognize this:
0 commit comments