Skip to content

Commit 0cc9e73

Browse files
committed
initial release 4.8
1 parent 68e0b23 commit 0cc9e73

File tree

18 files changed

+5649
-17
lines changed

18 files changed

+5649
-17
lines changed

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ Contents:
6363
:caption: Convert Module
6464

6565
load-phoneme
66+
load-rumi-jawi
67+
load-jawi-rumi
6668

6769
.. toctree::
6870
:maxdepth: 2

docs/load-jawi-rumi.ipynb

Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Jawi-to-Rumi"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"<div class=\"alert alert-info\">\n",
15+
"\n",
16+
"This tutorial is available as an IPython notebook at [Malaya/example/jawi-rumi](https://github.com/huseinzol05/Malaya/tree/master/example/jawi-rumi).\n",
17+
" \n",
18+
"</div>"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"<div class=\"alert alert-info\">\n",
26+
"\n",
27+
"This module trained on both standard and local (included social media) language structures, so it is save to use for both.\n",
28+
" \n",
29+
"</div>"
30+
]
31+
},
32+
{
33+
"cell_type": "markdown",
34+
"metadata": {},
35+
"source": [
36+
"## Explanation\n",
37+
"\n",
38+
"Originally from https://www.ejawi.net/converterV2.php?go=rumi able to convert Rumi to Jawi using heuristic method. So Malaya convert from heuristic and map it using deep learning model by inverse the dataset.\n",
39+
"\n",
40+
"`چوميل` -> `comel`"
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": 1,
46+
"metadata": {},
47+
"outputs": [
48+
{
49+
"name": "stdout",
50+
"output_type": "stream",
51+
"text": [
52+
"CPU times: user 5.95 s, sys: 1.15 s, total: 7.1 s\n",
53+
"Wall time: 9.05 s\n"
54+
]
55+
}
56+
],
57+
"source": [
58+
"%%time\n",
59+
"import malaya"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"metadata": {},
65+
"source": [
66+
"### Use deep learning model\n",
67+
"\n",
68+
"Load LSTM + Bahdanau Attention Jawi to Rumi model.\n",
69+
"\n",
70+
"If you are using Tensorflow 2, make sure Tensorflow Addons already installed,\n",
71+
"\n",
72+
"```bash\n",
73+
"pip install tensorflow-addons U\n",
74+
"```"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"metadata": {},
80+
"source": [
81+
"```python\n",
82+
"def deep_model(quantized: bool = False, **kwargs):\n",
83+
" \"\"\"\n",
84+
" Load LSTM + Bahdanau Attention Rumi to Jawi model.\n",
85+
" Original size 11MB, quantized size 2.92MB .\n",
86+
" CER on test set: 0.09239719040982326\n",
87+
" WER on test set: 0.33811816744187656\n",
88+
"\n",
89+
" Parameters\n",
90+
" ----------\n",
91+
" quantized : bool, optional (default=False)\n",
92+
" if True, will load 8-bit quantized model.\n",
93+
" Quantized model not necessary faster, totally depends on the machine.\n",
94+
"\n",
95+
" Returns\n",
96+
" -------\n",
97+
" result: malaya.model.tf.Seq2SeqLSTM class\n",
98+
" \"\"\"\n",
99+
"```"
100+
]
101+
},
102+
{
103+
"cell_type": "code",
104+
"execution_count": 2,
105+
"metadata": {},
106+
"outputs": [
107+
{
108+
"data": {
109+
"application/vnd.jupyter.widget-view+json": {
110+
"model_id": "530a47ea5c514ae9aa68c8a4e1e29d9c",
111+
"version_major": 2,
112+
"version_minor": 0
113+
},
114+
"text/plain": [
115+
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=11034253.0, style=ProgressStyle(descrip…"
116+
]
117+
},
118+
"metadata": {},
119+
"output_type": "display_data"
120+
},
121+
{
122+
"name": "stdout",
123+
"output_type": "stream",
124+
"text": [
125+
"\n"
126+
]
127+
}
128+
],
129+
"source": [
130+
"model = malaya.jawi_rumi.deep_model()"
131+
]
132+
},
133+
{
134+
"cell_type": "markdown",
135+
"metadata": {},
136+
"source": [
137+
"### Load Quantized model\n",
138+
"\n",
139+
"To load 8-bit quantized model, simply pass `quantized = True`, default is `False`.\n",
140+
"\n",
141+
"We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine."
142+
]
143+
},
144+
{
145+
"cell_type": "code",
146+
"execution_count": 3,
147+
"metadata": {},
148+
"outputs": [
149+
{
150+
"name": "stderr",
151+
"output_type": "stream",
152+
"text": [
153+
"Load quantized model will cause accuracy drop.\n"
154+
]
155+
},
156+
{
157+
"data": {
158+
"application/vnd.jupyter.widget-view+json": {
159+
"model_id": "6d1d22a65abd48a28f9a1eb62f2d0c4d",
160+
"version_major": 2,
161+
"version_minor": 0
162+
},
163+
"text/plain": [
164+
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2926859.0, style=ProgressStyle(descript…"
165+
]
166+
},
167+
"metadata": {},
168+
"output_type": "display_data"
169+
},
170+
{
171+
"name": "stdout",
172+
"output_type": "stream",
173+
"text": [
174+
"\n"
175+
]
176+
}
177+
],
178+
"source": [
179+
"quantized_model = malaya.jawi_rumi.deep_model(quantized = True)"
180+
]
181+
},
182+
{
183+
"cell_type": "markdown",
184+
"metadata": {},
185+
"source": [
186+
"#### Predict\n",
187+
"\n",
188+
"```python\n",
189+
"def predict(self, strings: List[str], beam_search: bool = False):\n",
190+
" \"\"\"\n",
191+
" Convert to target string.\n",
192+
"\n",
193+
" Parameters\n",
194+
" ----------\n",
195+
" strings : List[str]\n",
196+
" beam_search : bool, (optional=False)\n",
197+
" If True, use beam search decoder, else use greedy decoder.\n",
198+
"\n",
199+
" Returns\n",
200+
" -------\n",
201+
" result: List[str]\n",
202+
" \"\"\"\n",
203+
"```\n",
204+
"\n",
205+
"If want to speed up the inference, set `beam_search = False`."
206+
]
207+
},
208+
{
209+
"cell_type": "code",
210+
"execution_count": 4,
211+
"metadata": {},
212+
"outputs": [
213+
{
214+
"data": {
215+
"text/plain": [
216+
"['saya suka makan im',\n",
217+
" 'eak ack kotok',\n",
218+
" 'aisuk berthday saya, jegan lupa bawak hadiah']"
219+
]
220+
},
221+
"execution_count": 4,
222+
"metadata": {},
223+
"output_type": "execute_result"
224+
}
225+
],
226+
"source": [
227+
"model.predict(['ساي سوك ماكن ايم', 'اياق اچق كوتوق', 'ايسوق بيرثداي ساي، جڬن لوڤا باوق هديه'])"
228+
]
229+
},
230+
{
231+
"cell_type": "code",
232+
"execution_count": 5,
233+
"metadata": {},
234+
"outputs": [
235+
{
236+
"data": {
237+
"text/plain": [
238+
"['saya suka makan im',\n",
239+
" 'eak ack kotok',\n",
240+
" 'aisuk berthday saya, jegan lopa bawak hadiah']"
241+
]
242+
},
243+
"execution_count": 5,
244+
"metadata": {},
245+
"output_type": "execute_result"
246+
}
247+
],
248+
"source": [
249+
"quantized_model.predict(['ساي سوك ماكن ايم', 'اياق اچق كوتوق', 'ايسوق بيرثداي ساي، جڬن لوڤا باوق هديه'])"
250+
]
251+
}
252+
],
253+
"metadata": {
254+
"kernelspec": {
255+
"display_name": "Python 3",
256+
"language": "python",
257+
"name": "python3"
258+
},
259+
"language_info": {
260+
"codemirror_mode": {
261+
"name": "ipython",
262+
"version": 3
263+
},
264+
"file_extension": ".py",
265+
"mimetype": "text/x-python",
266+
"name": "python",
267+
"nbconvert_exporter": "python",
268+
"pygments_lexer": "ipython3",
269+
"version": "3.7.7"
270+
},
271+
"varInspector": {
272+
"cols": {
273+
"lenName": 16,
274+
"lenType": 16,
275+
"lenVar": 40
276+
},
277+
"kernels_config": {
278+
"python": {
279+
"delete_cmd_postfix": "",
280+
"delete_cmd_prefix": "del ",
281+
"library": "var_list.py",
282+
"varRefreshCmd": "print(var_dic_list())"
283+
},
284+
"r": {
285+
"delete_cmd_postfix": ") ",
286+
"delete_cmd_prefix": "rm(",
287+
"library": "var_list.r",
288+
"varRefreshCmd": "cat(var_dic_list()) "
289+
}
290+
},
291+
"types_to_exclude": [
292+
"module",
293+
"function",
294+
"builtin_function_or_method",
295+
"instance",
296+
"_Feature"
297+
],
298+
"window_display": false
299+
}
300+
},
301+
"nbformat": 4,
302+
"nbformat_minor": 4
303+
}

0 commit comments

Comments
 (0)