Skip to content

Commit 8057d61

Browse files
committed
release version 1.7, added word-mover distance, text similarity and etc
1 parent ed0dee5 commit 8057d61

File tree

79 files changed

+42637
-7514
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+42637
-7514
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ malaya/__pycache__
99
docs/_build
1010
docs/_static
1111
docs/_templates
12+
siamese
13+
skipthought

docs/Api.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -103,19 +103,19 @@ malaya.summarize
103103
.. automodule:: malaya.summarize
104104
:members:
105105

106-
malaya.topics_influencer
106+
malaya.similarity
107107
-------------------------
108108

109-
.. automodule:: malaya.topic_influencer
109+
.. automodule:: malaya.similarity
110110
:members:
111111

112-
.. autoclass:: malaya.topic_influencer._DEEP_SIAMESE_SIMILARITY()
112+
.. autoclass:: malaya.similarity._DEEP_SIAMESE_SIMILARITY()
113113
:members:
114114

115-
.. autoclass:: malaya.topic_influencer._DEEP_SIMILARITY()
115+
.. autoclass:: malaya.similarity._DEEP_SIMILARITY()
116116
:members:
117117

118-
.. autoclass:: malaya.topic_influencer._FAST_SIMILARITY()
118+
.. autoclass:: malaya.similarity._FAST_SIMILARITY()
119119
:members:
120120

121121
malaya.topic_model

docs/Dataset.rst

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ Total size: 8.5 MB
4040
`Gender <https://github.com/huseinzol05/Malaya-Dataset/blob/master/gender>`__
4141
-----------------------------------------------------------------------------
4242

43+
Total size: 2.2 MB
44+
4345
1. Unknown
4446
2. Male
4547
3. Female
@@ -153,7 +155,7 @@ Total size: 496 KB
153155
`Sentiment Twitter <https://github.com/huseinzol05/Malaya-Dataset/blob/master/twitter-sentiment>`__
154156
---------------------------------------------------------------------------------------------------
155157

156-
Total size: 27.4 MB
158+
Total size: 50.6 MB
157159

158160
1. Positive
159161
2. Negative
@@ -226,6 +228,20 @@ Total size: 1.4 MB
226228
1. Positive
227229
2. Negative
228230

231+
`Toxicity <https://github.com/huseinzol05/Malaya-Dataset/blob/master/toxicity>`__
232+
-----------------------------------------------------------------------------------------
233+
234+
Total size: 70 MB
235+
236+
Toxicity is multilabel, prefer to use sigmoid based.
237+
238+
1. toxic
239+
2. severe toxic
240+
3. obscene
241+
4. threat
242+
5. insult
243+
6. identity hate
244+
229245
`Subtitle <https://github.com/huseinzol05/Malaya-Dataset/blob/master/subtitle>`__
230246
---------------------------------------------------------------------------------
231247

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
Topics & Influencers Analysis
1+
Word-Mover Distance
22
==============================
33

44
.. note::
55

66
This tutorial is available as an IPython notebook
7-
`here <https://github.com/huseinzol05/Malaya/tree/master/example/topics-influencers>`_.
7+
`here <https://github.com/huseinzol05/Malaya/tree/master/example/word-mover>`_.
88

9-
.. include:: load-topics-influencers.rst
9+
.. include:: load-word-mover-distance.rst

docs/Similarity.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Text Similarity
2+
==============================
3+
4+
.. note::
5+
6+
This tutorial is available as an IPython notebook
7+
`here <https://github.com/huseinzol05/Malaya/tree/master/example/similarity>`_.
8+
9+
.. include:: load-similarity.rst

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,16 @@ Contents:
3131
Num2word
3232
Pos
3333
Sentiment
34+
Similarity
3435
Spell
3536
Stack
3637
Stemmer
3738
Subjective
3839
Summarization
3940
Topic
40-
Topics
4141
Toxic
4242
Word2vec
43+
Mover
4344
Cluster
4445
Api
4546
Reference

docs/load-emotion.rst

Lines changed: 68 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
88
.. parsed-literal::
99
10-
CPU times: user 10.4 s, sys: 640 ms, total: 11 s
11-
Wall time: 11 s
10+
CPU times: user 12 s, sys: 1.41 s, total: 13.4 s
11+
Wall time: 17.1 s
1212
1313
1414
.. code:: python
@@ -43,7 +43,7 @@ Load multinomial model
4343
.. parsed-literal::
4444
4545
anger
46-
{'anger': 0.27993946463423486, 'fear': 0.1482931513658756, 'joy': 0.1880009584798728, 'love': 0.21711876657658918, 'sadness': 0.1296730712078804, 'surprise': 0.03697458773554805}
46+
{'anger': 0.30367763926253094, 'fear': 0.16709964152193366, 'joy': 0.17026521921403184, 'love': 0.18405977732934192, 'sadness': 0.1388341895665479, 'surprise': 0.03606353310561458}
4747
4848
4949
@@ -73,31 +73,31 @@ Load xgb model
7373
.. parsed-literal::
7474
7575
love
76-
{'anger': 0.21755809, 'fear': 0.090371706, 'joy': 0.13347618, 'love': 0.47302967, 'sadness': 0.0770047, 'surprise': 0.008559667}
76+
{'anger': 0.22918181, 'fear': 0.089252785, 'joy': 0.1318236, 'love': 0.46476611, 'sadness': 0.07200217, 'surprise': 0.012973559}
7777
7878
7979
8080
8181
.. parsed-literal::
8282
83-
[{'anger': 0.21755809,
84-
'fear': 0.090371706,
85-
'joy': 0.13347618,
86-
'love': 0.47302967,
87-
'sadness': 0.0770047,
88-
'surprise': 0.008559667},
83+
[{'anger': 0.22918181,
84+
'fear': 0.089252785,
85+
'joy': 0.1318236,
86+
'love': 0.46476611,
87+
'sadness': 0.07200217,
88+
'surprise': 0.012973559},
8989
{'anger': 0.013483193,
9090
'fear': 0.939588,
9191
'joy': 0.01674833,
9292
'love': 0.003220023,
9393
'sadness': 0.022906518,
9494
'surprise': 0.0040539484},
95-
{'anger': 0.09142393,
96-
'fear': 0.029400537,
97-
'joy': 0.78257465,
98-
'love': 0.02881839,
99-
'sadness': 0.058004435,
100-
'surprise': 0.009778041},
95+
{'anger': 0.10506946,
96+
'fear': 0.025150253,
97+
'joy': 0.725915,
98+
'love': 0.05211037,
99+
'sadness': 0.078554265,
100+
'surprise': 0.013200594},
101101
{'anger': 0.11640434,
102102
'fear': 0.097485565,
103103
'joy': 0.24893147,
@@ -110,12 +110,12 @@ Load xgb model
110110
'love': 0.022184724,
111111
'sadness': 0.41255626,
112112
'surprise': 0.006135965},
113-
{'anger': 0.0714585,
114-
'fear': 0.19790031,
115-
'joy': 0.037659157,
116-
'love': 0.0025473926,
117-
'sadness': 0.00772799,
118-
'surprise': 0.6827066}]
113+
{'anger': 0.07513438,
114+
'fear': 0.2525073,
115+
'joy': 0.024355419,
116+
'love': 0.002638406,
117+
'sadness': 0.0059716892,
118+
'surprise': 0.6393928}]
119119
120120
121121
@@ -167,27 +167,27 @@ List available deep learning models
167167
Testing fast-text model
168168
love
169169
['love', 'fear', 'joy', 'love', 'sadness', 'surprise']
170-
[{'anger': 2.978304e-06, 'fear': 1.8461518e-10, 'joy': 1.0204276e-09, 'love': 0.999997, 'sadness': 1.3693535e-09, 'surprise': 2.6386826e-09}, {'anger': 1.2210384e-18, 'fear': 1.0, 'joy': 1.0015556e-19, 'love': 1.8750202e-24, 'sadness': 6.976661e-21, 'surprise': 3.2600536e-15}, {'anger': 2.47199e-19, 'fear': 2.3032567e-22, 'joy': 1.0, 'love': 5.1478095e-14, 'sadness': 4.464682e-20, 'surprise': 1.588908e-15}, {'anger': 4.1249185e-11, 'fear': 1.7474476e-10, 'joy': 0.00022258118, 'love': 0.9997774, 'sadness': 1.6592432e-11, 'surprise': 4.1854236e-09}, {'anger': 4.3972154e-08, 'fear': 2.1118221e-06, 'joy': 3.4898858e-07, 'love': 4.5489975e-12, 'sadness': 0.9999975, 'surprise': 4.8414757e-09}, {'anger': 1.1130476e-23, 'fear': 0.0003273876, 'joy': 5.694222e-17, 'love': 1.9363045e-25, 'sadness': 1.4252974e-26, 'surprise': 0.99967265}]
170+
[{'anger': 2.538603e-07, 'fear': 4.1372344e-13, 'joy': 1.0892472e-08, 'love': 0.99999976, 'sadness': 3.8994935e-16, 'surprise': 2.439655e-08}, {'anger': 4.4489467e-24, 'fear': 1.0, 'joy': 1.3903143e-28, 'love': 1.7920514e-33, 'sadness': 1.01771616e-26, 'surprise': 6.799581e-18}, {'anger': 9.583714e-26, 'fear': 1.5029816e-24, 'joy': 1.0, 'love': 3.7527533e-13, 'sadness': 8.348174e-24, 'surprise': 2.080897e-16}, {'anger': 1.7409228e-13, 'fear': 3.2279754e-12, 'joy': 0.0005876841, 'love': 0.9994123, 'sadness': 1.8902605e-11, 'surprise': 9.9256076e-11}, {'anger': 1.2737708e-11, 'fear': 5.882562e-10, 'joy': 9.112171e-13, 'love': 7.7659496e-20, 'sadness': 1.0, 'surprise': 1.6035637e-16}, {'anger': 5.5730725e-37, 'fear': 0.16033638, 'joy': 1.2999706e-30, 'love': 0.0, 'sadness': 0.0, 'surprise': 0.8396636}]
171171
172172
Testing hierarchical model
173-
joy
173+
anger
174174
['anger', 'fear', 'joy', 'joy', 'sadness', 'joy']
175-
[{'anger': 0.39431405, 'fear': 0.13933083, 'joy': 0.17727984, 'love': 0.042310942, 'sadness': 0.22523886, 'surprise': 0.021525377}, {'anger': 0.004958992, 'fear': 0.9853917, 'joy': 0.006676573, 'love': 0.00023657709, 'sadness': 0.0017484307, 'surprise': 0.0009877522}, {'anger': 0.0013627211, 'fear': 0.0017271177, 'joy': 0.986464, 'love': 0.0039458317, 'sadness': 0.0021411367, 'surprise': 0.0043591294}, {'anger': 0.028909639, 'fear': 0.09853578, 'joy': 0.50412154, 'love': 0.26376858, 'sadness': 0.084195614, 'surprise': 0.02046885}, {'anger': 0.022849305, 'fear': 0.011993612, 'joy': 0.008679014, 'love': 0.002472554, 'sadness': 0.9502534, 'surprise': 0.003752149}, {'anger': 0.015510161, 'fear': 0.0571924, 'joy': 0.5819401, 'love': 0.21683867, 'sadness': 0.006425157, 'surprise': 0.12209346}]
175+
[{'anger': 0.22394963, 'fear': 0.35022292, 'joy': 0.19895941, 'love': 0.013231089, 'sadness': 0.20033234, 'surprise': 0.013304558}, {'anger': 0.0056565125, 'fear': 0.9885886, 'joy': 0.0034398232, 'love': 0.00018917819, 'sadness': 0.0012037805, 'surprise': 0.00092218135}, {'anger': 0.01764421, 'fear': 0.01951682, 'joy': 0.8797468, 'love': 0.041130837, 'sadness': 0.013527576, 'surprise': 0.028433735}, {'anger': 0.028772388, 'fear': 0.07343067, 'joy': 0.48502314, 'love': 0.28668693, 'sadness': 0.10576224, 'surprise': 0.020324599}, {'anger': 0.021873059, 'fear': 0.014633018, 'joy': 0.01073073, 'love': 0.0012993184, 'sadness': 0.94936466, 'surprise': 0.0020992015}, {'anger': 0.020028168, 'fear': 0.17150529, 'joy': 0.3734562, 'love': 0.19241562, 'sadness': 0.008164915, 'surprise': 0.23442967}]
176176
177177
Testing bahdanau model
178178
love
179-
['love', 'fear', 'joy', 'love', 'sadness', 'surprise']
180-
[{'anger': 0.44805261, 'fear': 0.18378404, 'joy': 0.02516251, 'love': 0.30925235, 'sadness': 0.027497768, 'surprise': 0.0062507084}, {'anger': 0.0010828926, 'fear': 0.9789995, 'joy': 0.0027138714, 'love': 0.00061593985, 'sadness': 0.0048968275, 'surprise': 0.011690898}, {'anger': 0.012288661, 'fear': 0.0025563037, 'joy': 0.85003525, 'love': 0.12451392, 'sadness': 0.0008497203, 'surprise': 0.009756153}, {'anger': 0.02319879, 'fear': 0.031080244, 'joy': 0.14820175, 'love': 0.7294624, 'sadness': 0.021997027, 'surprise': 0.046059813}, {'anger': 0.031083692, 'fear': 0.035790402, 'joy': 0.01741525, 'love': 0.00062268815, 'sadness': 0.9130492, 'surprise': 0.0020387478}, {'anger': 0.00159852, 'fear': 0.34762463, 'joy': 0.04318491, 'love': 0.0028805388, 'sadness': 0.00093575486, 'surprise': 0.6037757}]
179+
['anger', 'fear', 'joy', 'love', 'sadness', 'surprise']
180+
[{'anger': 0.53818357, 'fear': 0.14104106, 'joy': 0.010708541, 'love': 0.2570674, 'sadness': 0.047102023, 'surprise': 0.005897305}, {'anger': 0.0005677081, 'fear': 0.9770825, 'joy': 0.005677423, 'love': 0.0007302013, 'sadness': 0.0017472907, 'surprise': 0.014194911}, {'anger': 0.06975506, 'fear': 0.0069800974, 'joy': 0.5717373, 'love': 0.30618504, 'sadness': 0.011454151, 'surprise': 0.033888407}, {'anger': 0.0038130684, 'fear': 0.0053994465, 'joy': 0.10317592, 'love': 0.8656706, 'sadness': 0.0056833136, 'surprise': 0.016257582}, {'anger': 0.01122868, 'fear': 0.019208057, 'joy': 0.0024597098, 'love': 0.0002851458, 'sadness': 0.965973, 'surprise': 0.00084543176}, {'anger': 0.00083102344, 'fear': 0.23240082, 'joy': 0.033536877, 'love': 0.0011026214, 'sadness': 0.00037630452, 'surprise': 0.7317524}]
181181
182182
Testing luong model
183183
love
184-
['love', 'fear', 'joy', 'love', 'sadness', 'fear']
185-
[{'anger': 0.044591118, 'fear': 0.063305356, 'joy': 0.33247164, 'love': 0.5347649, 'sadness': 0.0068765697, 'surprise': 0.017990304}, {'anger': 0.0064159264, 'fear': 0.9606779, 'joy': 0.012426791, 'love': 0.0013584964, 'sadness': 0.008015306, 'surprise': 0.011105636}, {'anger': 0.0036163705, 'fear': 5.7273093e-05, 'joy': 0.98739016, 'love': 0.0076421387, 'sadness': 0.00028883366, 'surprise': 0.0010052109}, {'anger': 0.017377134, 'fear': 0.0073309895, 'joy': 0.07374035, 'love': 0.3433876, 'sadness': 0.5455663, 'surprise': 0.012597541}, {'anger': 0.0007876828, 'fear': 0.0009606754, 'joy': 9.633098e-05, 'love': 0.00014691186, 'sadness': 0.9978861, 'surprise': 0.00012229013}, {'anger': 0.00045764598, 'fear': 0.37070635, 'joy': 0.0005788357, 'love': 0.00027592952, 'sadness': 0.00033797708, 'surprise': 0.6276433}]
184+
['joy', 'fear', 'joy', 'sadness', 'sadness', 'surprise']
185+
[{'anger': 0.057855386, 'fear': 0.040447887, 'joy': 0.29915547, 'love': 0.5720974, 'sadness': 0.00927453, 'surprise': 0.02116932}, {'anger': 0.0063275485, 'fear': 0.9673098, 'joy': 0.0065225014, 'love': 0.0008387138, 'sadness': 0.00706696, 'surprise': 0.011934649}, {'anger': 0.0014677589, 'fear': 0.0020899512, 'joy': 0.88741183, 'love': 0.076111265, 'sadness': 0.0038936164, 'surprise': 0.029025558}, {'anger': 0.013268307, 'fear': 0.0035831807, 'joy': 0.056010414, 'love': 0.21701123, 'sadness': 0.69225526, 'surprise': 0.017871574}, {'anger': 0.0018013288, 'fear': 0.0012173079, 'joy': 5.611221e-05, 'love': 9.00831e-05, 'sadness': 0.9967213, 'surprise': 0.000113809925}, {'anger': 0.00015200193, 'fear': 0.36670414, 'joy': 0.0003732592, 'love': 0.00011813393, 'sadness': 0.000118975, 'surprise': 0.63253355}]
186186
187187
Testing bidirectional model
188-
surprise
189-
['anger', 'anger', 'anger', 'anger', 'anger', 'fear']
190-
[{'anger': 0.613231, 'fear': 0.21215951, 'joy': 0.00012107872, 'love': 0.007714424, 'sadness': 0.0029091935, 'surprise': 0.16386479}, {'anger': 0.7650685, 'fear': 0.12844206, 'joy': 0.00046135965, 'love': 0.0025065169, 'sadness': 0.012999088, 'surprise': 0.09052232}, {'anger': 0.7017255, 'fear': 0.12622964, 'joy': 0.00019186054, 'love': 0.0041279723, 'sadness': 0.0051922314, 'surprise': 0.16253278}, {'anger': 0.83330584, 'fear': 0.099247426, 'joy': 0.0007255099, 'love': 0.0023077168, 'sadness': 0.016625375, 'surprise': 0.047788195}, {'anger': 0.77445495, 'fear': 0.11811776, 'joy': 0.00019311535, 'love': 0.002333317, 'sadness': 0.004926041, 'surprise': 0.09997472}, {'anger': 0.28467438, 'fear': 0.3107746, 'joy': 0.0009574863, 'love': 0.039786864, 'sadness': 0.0549624, 'surprise': 0.3088443}]
188+
love
189+
['fear', 'fear', 'anger', 'joy', 'sadness', 'surprise']
190+
[{'anger': 0.031539902, 'fear': 0.44634053, 'joy': 0.0022038615, 'love': 0.24390388, 'sadness': 0.00030186496, 'surprise': 0.27570996}, {'anger': 0.0028205896, 'fear': 0.9787958, 'joy': 0.016622344, 'love': 0.00041048063, 'sadness': 0.0004424488, 'surprise': 0.00090834824}, {'anger': 0.4523394, 'fear': 0.32489082, 'joy': 0.04712723, 'love': 0.01679146, 'sadness': 0.039135754, 'surprise': 0.1197153}, {'anger': 0.04196525, 'fear': 0.08604635, 'joy': 0.65291435, 'love': 0.049389884, 'sadness': 0.077201255, 'surprise': 0.09248292}, {'anger': 0.06327597, 'fear': 0.058998022, 'joy': 0.041568566, 'love': 0.002343863, 'sadness': 0.8224733, 'surprise': 0.011340328}, {'anger': 1.5136379e-05, 'fear': 0.002162331, 'joy': 3.5301118e-06, 'love': 0.006482973, 'sadness': 2.4173462e-06, 'surprise': 0.99133366}]
191191
192192
Testing bert model
193193
anger
@@ -367,39 +367,39 @@ will try to evolve it.
367367
368368
.. parsed-literal::
369369
370-
[{'anger': 0.055561937,
371-
'fear': 0.034661848,
372-
'joy': 0.20765074,
373-
'love': 0.65774184,
374-
'sadness': 0.0210206,
375-
'surprise': 0.023363067},
376-
{'anger': 1.5065236e-05,
377-
'fear': 0.9998666,
378-
'joy': 6.3056427e-06,
379-
'love': 2.9068442e-06,
380-
'sadness': 3.6798014e-05,
381-
'surprise': 7.235542e-05},
382-
{'anger': 0.00097060547,
383-
'fear': 5.1922354e-05,
384-
'joy': 0.99052715,
385-
'love': 0.0024538564,
386-
'sadness': 0.0005109437,
387-
'surprise': 0.005485538},
388-
{'anger': 0.00014133049,
389-
'fear': 0.0004463539,
390-
'joy': 0.12486383,
391-
'love': 0.87307847,
392-
'sadness': 0.0013382707,
393-
'surprise': 0.0001317923},
394-
{'anger': 0.0077239843,
395-
'fear': 0.014800851,
396-
'joy': 0.008525367,
397-
'love': 0.0013007816,
398-
'sadness': 0.9655128,
399-
'surprise': 0.0021361646},
400-
{'anger': 0.0003960413,
401-
'fear': 0.6634573,
402-
'joy': 0.0014801685,
403-
'love': 0.00056572456,
404-
'sadness': 0.000516784,
405-
'surprise': 0.33358407}]
370+
[{'anger': 0.07479232,
371+
'fear': 0.012134718,
372+
'joy': 0.034137156,
373+
'love': 0.85221285,
374+
'sadness': 0.006336733,
375+
'surprise': 0.020386234},
376+
{'anger': 1.6892743e-08,
377+
'fear': 0.99999964,
378+
'joy': 6.260633e-08,
379+
'love': 3.2111713e-10,
380+
'sadness': 3.542872e-08,
381+
'surprise': 2.2207877e-07},
382+
{'anger': 0.00012469916,
383+
'fear': 9.6892345e-06,
384+
'joy': 0.9917463,
385+
'love': 0.006561422,
386+
'sadness': 0.00040069615,
387+
'surprise': 0.0011572224},
388+
{'anger': 5.0021445e-05,
389+
'fear': 0.0010109642,
390+
'joy': 0.049688663,
391+
'love': 0.94577587,
392+
'sadness': 0.0032941191,
393+
'surprise': 0.00018034693},
394+
{'anger': 0.0010146926,
395+
'fear': 0.00020020001,
396+
'joy': 5.2909185e-05,
397+
'love': 2.640257e-06,
398+
'sadness': 0.99870074,
399+
'surprise': 2.8823646e-05},
400+
{'anger': 0.0057854424,
401+
'fear': 0.8317998,
402+
'joy': 0.017287944,
403+
'love': 0.008883897,
404+
'sadness': 0.0070799366,
405+
'surprise': 0.12916291}]
176 Bytes
Loading
916 Bytes
Loading

0 commit comments

Comments
 (0)