Skip to content

Commit 78da89a

Browse files
committed
Merge branch 'release-1.0.0'
2 parents df13670 + b1d38cf commit 78da89a

29 files changed

+2221
-7789
lines changed

CHANGELOG.md

+49
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,55 @@ Changes
33

44
Unreleased:
55

6+
7+
========
8+
9+
1.0.0, 2017-02-24
10+
11+
New features:
12+
* Add Author-topic modeling (@olavurmortensen,[#893](https://github.com/RaRe-Technologies/gensim/pull/893))
13+
* Add FastText word embedding wrapper (@Jayantj,[#847](https://github.com/RaRe-Technologies/gensim/pull/847))
14+
* Add WordRank word embedding wrapper (@parulsethi,[#1066](https://github.com/RaRe-Technologies/gensim/pull/1066), [#1125](https://github.com/RaRe-Technologies/gensim/pull/1125))
15+
* Add sklearn wrapper for LDAModel (@AadityaJ,[#932](https://github.com/RaRe-Technologies/gensim/pull/932))
16+
17+
Deprecated features:
18+
19+
* Move `load_word2vec_format` and `save_word2vec_format` out of Word2Vec class to KeyedVectors (@tmylk,[#1107](https://github.com/RaRe-Technologies/gensim/pull/1107))
20+
* Move properties `syn0norm`, `syn0`, `vocab`, `index2word` from Word2Vec class to KeyedVectors (@tmylk,[#1147](https://github.com/RaRe-Technologies/gensim/pull/1147))
21+
* Remove support for Python 2.6, 3.3 and 3.4 (@tmylk,[#1145](https://github.com/RaRe-Technologies/gensim/pull/1145))
22+
23+
24+
Improvements:
25+
26+
* Python 3.6 support (@tmylk [#1077](https://github.com/RaRe-Technologies/gensim/pull/1077))
27+
* Phrases and Phraser allow a generator corpus (ELind77 [#1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
28+
* Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
29+
* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
30+
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
31+
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
32+
* Deacc option to keywords pre-processing (@bhargavvader,[#1076](https://github.com/RaRe-Technologies/gensim/pull/1076))
33+
* Generate Deprecated exception when using Word2Vec.load_word2vec_format (@tmylk, [#1165](https://github.com/RaRe-Technologies/gensim/pull/1165))
34+
* Fix hdpmodel constructor docstring for print_topics (#1152) (@toliwa, [#1152](https://github.com/RaRe-Technologies/gensim/pull/1152))
35+
* Default to per_word_topics=False in LDA get_item for performance (@menshikh-iv, [#1154](https://github.com/RaRe-Technologies/gensim/pull/1154))
36+
* Fix bound computation in Author Topic models. (@olavurmortensen, [#1156](https://github.com/RaRe-Technologies/gensim/pull/1156))
37+
* Write UTF-8 byte strings in tensorboard conversion (@tmylk,[#1144](https://github.com/RaRe-Technologies/gensim/pull/1144))
38+
* Make top_topics and sparse2full compatible with numpy 1.12 strictly int idexing (@tmylk,[#1146](https://github.com/RaRe-Technologies/gensim/pull/1146))
39+
40+
Tutorial and doc improvements:
41+
42+
* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
43+
* Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,[#1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
44+
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
45+
* Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,[#1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
46+
* Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,[#1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
47+
* Update Transformation and Topics link from quick start notebook (@mariana393,[#1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
48+
* Quick Start Text clarification and typo correction (@luizcavalcanti,[#1114](https://github.com/RaRe-Technologies/gensim/pull/1114))
49+
* Fix typos in Author-topic tutorial (@Fil,[#1102](https://github.com/RaRe-Technologies/gensim/pull/1102))
50+
* Address benchmark inconsistencies in Annoy tutorial (@droudy,[#1113](https://github.com/RaRe-Technologies/gensim/pull/1113))
51+
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))
52+
* Add documentation for WikiCorpus metadata. (@kirit93, [#1163](https://github.com/RaRe-Technologies/gensim/pull/1163))
53+
54+
655
1.0.0RC2, 2017-02-16
756

857
* Add note about Annoy speed depending on numpy BLAS setup in annoytutorial.ipynb (@greninja,[#1137](https://github.com/RaRe-Technologies/gensim/pull/1137))

README.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,9 @@ For alternative modes of installation (without root privileges,
7878
development installation, optional install features), see the
7979
[documentation].
8080

81-
This version has been tested under Python 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6
82-
(support for Python 2.5 was dropped in gensim 0.10.0; install gensim
83-
0.9.1 if you *must* use Python 2.5). Gensim’s github repo is hooked
81+
This version has been tested under Python 2.7, 3.5 and 3.6. Gensim’s github repo is hooked
8482
against [Travis CI for automated testing] on every commit push and pull
85-
request.
83+
request. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5).
8684

8785
How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
8886
--------------------------------------------------------------------------------------------------------
@@ -111,6 +109,7 @@ Documentation
111109
[Tutorials]: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#tutorials
112110
[Tutorial Videos]: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#videos
113111
[Official Documentation and Walkthrough]: http://radimrehurek.com/gensim/
112+
[Official API Documentation]: http://radimrehurek.com/gensim/apiref.html
114113

115114
---------
116115

docs/notebooks/WMD_tutorial.ipynb

+22-55
Large diffs are not rendered by default.

docs/notebooks/Word2Vec_FastText_Comparison.ipynb

+27-51
Large diffs are not rendered by default.

docs/notebooks/Wordrank_comparisons.ipynb

+31-61
Large diffs are not rendered by default.

docs/notebooks/deepir.ipynb

+28-62
Large diffs are not rendered by default.

docs/notebooks/doc2vec-IMDB.ipynb

+27-67
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,7 @@
3939
{
4040
"cell_type": "code",
4141
"execution_count": 2,
42-
"metadata": {
43-
"collapsed": false
44-
},
42+
"metadata": {},
4543
"outputs": [],
4644
"source": [
4745
"import locale\n",
@@ -120,9 +118,7 @@
120118
{
121119
"cell_type": "code",
122120
"execution_count": 3,
123-
"metadata": {
124-
"collapsed": false
125-
},
121+
"metadata": {},
126122
"outputs": [],
127123
"source": [
128124
"import os.path\n",
@@ -139,9 +135,7 @@
139135
{
140136
"cell_type": "code",
141137
"execution_count": 1,
142-
"metadata": {
143-
"collapsed": false
144-
},
138+
"metadata": {},
145139
"outputs": [
146140
{
147141
"name": "stdout",
@@ -202,9 +196,7 @@
202196
{
203197
"cell_type": "code",
204198
"execution_count": 2,
205-
"metadata": {
206-
"collapsed": false
207-
},
199+
"metadata": {},
208200
"outputs": [
209201
{
210202
"name": "stdout",
@@ -254,9 +246,7 @@
254246
{
255247
"cell_type": "code",
256248
"execution_count": 5,
257-
"metadata": {
258-
"collapsed": false
259-
},
249+
"metadata": {},
260250
"outputs": [],
261251
"source": [
262252
"from gensim.test.test_doc2vec import ConcatenatedDoc2Vec\n",
@@ -281,9 +271,7 @@
281271
{
282272
"cell_type": "code",
283273
"execution_count": 8,
284-
"metadata": {
285-
"collapsed": false
286-
},
274+
"metadata": {},
287275
"outputs": [],
288276
"source": [
289277
"import numpy as np\n",
@@ -330,7 +318,7 @@
330318
" corrects = sum(np.rint(test_predictions) == [doc.sentiment for doc in test_data])\n",
331319
" errors = len(test_predictions) - corrects\n",
332320
" error_rate = float(errors) / len(test_predictions)\n",
333-
" return (error_rate, errors, len(test_predictions), predictor)\n"
321+
" return (error_rate, errors, len(test_predictions), predictor)"
334322
]
335323
},
336324
{
@@ -356,9 +344,7 @@
356344
{
357345
"cell_type": "code",
358346
"execution_count": 9,
359-
"metadata": {
360-
"collapsed": false
361-
},
347+
"metadata": {},
362348
"outputs": [],
363349
"source": [
364350
"from collections import defaultdict\n",
@@ -368,9 +354,7 @@
368354
{
369355
"cell_type": "code",
370356
"execution_count": 10,
371-
"metadata": {
372-
"collapsed": false
373-
},
357+
"metadata": {},
374358
"outputs": [
375359
{
376360
"name": "stdout",
@@ -579,9 +563,7 @@
579563
{
580564
"cell_type": "code",
581565
"execution_count": 12,
582-
"metadata": {
583-
"collapsed": true
584-
},
566+
"metadata": {},
585567
"outputs": [
586568
{
587569
"name": "stdout",
@@ -630,9 +612,7 @@
630612
{
631613
"cell_type": "code",
632614
"execution_count": 13,
633-
"metadata": {
634-
"collapsed": false
635-
},
615+
"metadata": {},
636616
"outputs": [
637617
{
638618
"name": "stdout",
@@ -673,9 +653,7 @@
673653
{
674654
"cell_type": "code",
675655
"execution_count": 14,
676-
"metadata": {
677-
"collapsed": false
678-
},
656+
"metadata": {},
679657
"outputs": [
680658
{
681659
"name": "stdout",
@@ -703,7 +681,7 @@
703681
"print(u'TARGET (%d): «%s»\\n' % (doc_id, ' '.join(alldocs[doc_id].words)))\n",
704682
"print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:\\n' % model)\n",
705683
"for label, index in [('MOST', 0), ('MEDIAN', len(sims)//2), ('LEAST', len(sims) - 1)]:\n",
706-
" print(u'%s %s: «%s»\\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))\n"
684+
" print(u'%s %s: «%s»\\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))"
707685
]
708686
},
709687
{
@@ -723,9 +701,7 @@
723701
{
724702
"cell_type": "code",
725703
"execution_count": 15,
726-
"metadata": {
727-
"collapsed": false
728-
},
704+
"metadata": {},
729705
"outputs": [],
730706
"source": [
731707
"word_models = simple_models[:]"
@@ -734,9 +710,7 @@
734710
{
735711
"cell_type": "code",
736712
"execution_count": 17,
737-
"metadata": {
738-
"collapsed": false
739-
},
713+
"metadata": {},
740714
"outputs": [
741715
{
742716
"name": "stdout",
@@ -806,14 +780,10 @@
806780
"('mockumentary', 0.5149033069610596),<br>\n",
807781
"('camp-fest', 0.5122634768486023),<br>\n",
808782
"('mystery/comedy', 0.5020694732666016)]</td></tr></table>"
809-
],
810-
"text/plain": [
811-
"<IPython.core.display.HTML at 0x1535b84d0>"
812783
]
813784
},
814-
"execution_count": 17,
815-
"metadata": {},
816-
"output_type": "execute_result"
785+
"output_type": "execute_result",
786+
"metadata": {}
817787
}
818788
],
819789
"source": [
@@ -855,9 +825,7 @@
855825
{
856826
"cell_type": "code",
857827
"execution_count": 26,
858-
"metadata": {
859-
"collapsed": false
860-
},
828+
"metadata": {},
861829
"outputs": [
862830
{
863831
"name": "stdout",
@@ -897,12 +865,10 @@
897865
{
898866
"cell_type": "code",
899867
"execution_count": null,
900-
"metadata": {
901-
"collapsed": false
902-
},
868+
"metadata": {},
903869
"outputs": [],
904870
"source": [
905-
"This cell left intentionally erroneous. "
871+
"This cell left intentionally erroneous."
906872
]
907873
},
908874
{
@@ -915,13 +881,11 @@
915881
{
916882
"cell_type": "code",
917883
"execution_count": null,
918-
"metadata": {
919-
"collapsed": false
920-
},
884+
"metadata": {},
921885
"outputs": [],
922886
"source": [
923-
"from gensim.models import Word2Vec\n",
924-
"w2v_g100b = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)\n",
887+
"from gensim.models import KeyedVectors\n",
888+
"w2v_g100b = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)\n",
925889
"w2v_g100b.compact_name = 'w2v_g100b'\n",
926890
"word_models.append(w2v_g100b)"
927891
]
@@ -936,9 +900,7 @@
936900
{
937901
"cell_type": "code",
938902
"execution_count": null,
939-
"metadata": {
940-
"collapsed": false
941-
},
903+
"metadata": {},
942904
"outputs": [],
943905
"source": [
944906
"import logging\n",
@@ -957,9 +919,7 @@
957919
{
958920
"cell_type": "code",
959921
"execution_count": null,
960-
"metadata": {
961-
"collapsed": false
962-
},
922+
"metadata": {},
963923
"outputs": [],
964924
"source": [
965925
"%load_ext autoreload\n",
@@ -976,7 +936,7 @@
976936
"language_info": {
977937
"codemirror_mode": {
978938
"name": "ipython",
979-
"version": 3
939+
"version": 3.0
980940
},
981941
"file_extension": ".py",
982942
"mimetype": "text/x-python",
@@ -988,4 +948,4 @@
988948
},
989949
"nbformat": 4,
990950
"nbformat_minor": 0
991-
}
951+
}

0 commit comments

Comments
 (0)