Skip to content

Commit e838391

Browse files
committed
Merge branch 'release-0.13.4'
2 parents 3067cb0 + 9f3d31e commit e838391

File tree

82 files changed

+50350
-5694
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+50350
-5694
lines changed

.travis.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ python:
55
- "2.7"
66
- "3.3"
77
- "3.4"
8-
- "3.5"
8+
- "3.5"
99
before_install:
1010
- wget 'http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh' -O miniconda.sh
1111
- chmod +x miniconda.sh
@@ -18,5 +18,6 @@ install:
1818
- pip install pyemd
1919
- pip install annoy
2020
- pip install testfixtures
21+
- pip install unittest2
2122
- python setup.py install
2223
script: python setup.py test

CHANGELOG.md

+34-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,37 @@
11
Changes
2-
=======
2+
===========
3+
4+
Unreleased:
5+
6+
None
7+
8+
0.13.4, 2016-12-22
9+
10+
* Evaluation of word2vec models against semantic similarity datasets like SimLex-999 (#1047) (@akutuzov, [#1047](https://github.com/RaRe-Technologies/gensim/pull/1047))
11+
* TensorBoard word embedding visualisation of Gensim Word2vec format (@loretoparisi, [#1051](https://github.com/RaRe-Technologies/gensim/pull/1051))
12+
* Throw exception if load() is called on instance rather than the class in word2vec and doc2vec (@dus0x,[(#889](https://github.com/RaRe-Technologies/gensim/pull/889))
13+
* Loading and Saving LDA Models across Python 2 and 3. Fix #853 (@anmolgulati, #913, [#1093](https://github.com/RaRe-Technologies/gensim/pull/1093))
14+
* Fix automatic learning of eta (prior over words) in LDA (@olavurmortensen, [#1024](https://github.com/RaRe-Technologies/gensim/pull/1024)).
15+
* eta should have dimensionality V (size of vocab) not K (number of topics). eta with shape K x V is still allowed, as the user may want to impose specific prior information to each topic.
16+
* eta is no longer allowed the "asymmetric" option. Asymmetric priors over words in general are fine (learned or user defined).
17+
* As a result, the eta update (`update_eta`) was simplified some. It also no longer logs eta when updated, because it is too large for that.
18+
* Unit tests were updated accordingly. The unit tests expect a different shape than before; some unit tests were redundant after the change; `eta='asymmetric'` now should raise an error.
19+
* Optimise show_topics to only call get_lambda once. Fix #1006. (@bhargavvader, [#1028](https://github.com/RaRe-Technologies/gensim/pull/1028))
20+
* HdpModel doc improvement. Inference and print_topics (@dsquareindia, [#1029](https://github.com/RaRe-Technologies/gensim/pull/1029))
21+
* Removing Doc2Vec defaults so that it won't override Word2Vec defaults. Fix #795 (@markroxor, [#929](https://github.com/RaRe-Technologies/gensim/pull/929))
22+
Remove warning on gensim import "pattern not installed". Fix #1009 (@shashankg7, #1018)
23+
* Add delete_temporary_training_data() function to word2vec and doc2vec models. (@deepmipt-VladZhukov, [#987](https://github.com/RaRe-Technologies/gensim/pull/987))
24+
* New class KeyedVectors to store embedding separate from training code (@anmol01gulati and @droudy, [#980](https://github.com/RaRe-Technologies/gensim/pull/980))
25+
* Documentation improvements (@IrinaGoloshchapova, [#1010](https://github.com/RaRe-Technologies/gensim/pull/1010), [#1011](https://github.com/RaRe-Technologies/gensim/pull/1011))
26+
* LDA tutorial by Olavur, tips and tricks (@olavurmortensen, [#779](https://github.com/RaRe-Technologies/gensim/pull/779))
27+
* Add double quote in commmand line to run on Windows (@akarazeev, [#1005](https://github.com/RaRe-Technologies/gensim/pull/1005))
28+
* Fix directory names in notebooks to be OS-independent (@mamamot, [#1004](https://github.com/RaRe-Technologies/gensim/pull/1004))
29+
* Respect clip_start, clip_end in most_similar. Fix #601. (@parulsethi, [#994](https://github.com/RaRe-Technologies/gensim/pull/994))
30+
* Replace Python sigmoid function with scipy in word2vec & doc2vec (@markroxor, [#989](https://github.com/RaRe-Technologies/gensim/pull/989))
31+
* WMD to return 0 instead of inf for sentences that contain a single word (@rbahumi, [#986](https://github.com/RaRe-Technologies/gensim/pull/986))
32+
* Pass all the params through the apply call in lda.get_document_topics(), test case to use the per_word_topics through the corpus in test_ldamodel (@parthoiiitm, [#978](https://github.com/RaRe-Technologies/gensim/pull/978))
33+
* Pyro annotations for lsi_worker (@markroxor, [#968](https://github.com/RaRe-Technologies/gensim/pull/968))
34+
335

436
0.13.3, 2016-10-20
537

@@ -20,6 +52,7 @@ Changes
2052
* Remove ShardedCorpus from init because of Theano dependency (@tmylk, [#919](https://github.com/RaRe-Technologies/gensim/pull/919))
2153
* Documentation improvements ( @dsquareindia & @tmylk, [#914](https://github.com/RaRe-Technologies/gensim/pull/914), [#906](https://github.com/RaRe-Technologies/gensim/pull/906) )
2254
* Add Annoy memory-mapping example (@harshul1610, [#899](https://github.com/RaRe-Technologies/gensim/pull/899))
55+
* Fixed issue [#601](https://github.com/RaRe-Technologies/gensim/issues/601), correct docID in most_similar for clip range (@parulsethi, [#994](https://github.com/RaRe-Technologies/gensim/pull/994))
2356

2457
0.13.2, 2016-08-19
2558

appveyor.yml

+3-4
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ environment:
1111
WHEELHOUSE_UPLOADER_USERNAME: "Lev.Konstantinovskiy"
1212
WHEELHOUSE_UPLOADER_SECRET:
1313
secure: qXqY3dFmLOqvxa3Om2gQi/BjotTOK+EP2IPLolBNo0c61yDtNWxbmE4wH3up72Be
14-
14+
1515
matrix:
1616
- PYTHON: "C:\\Python27"
1717
PYTHON_VERSION: "2.7.8"
@@ -45,7 +45,7 @@ install:
4545
# Install the build and runtime dependencies of the project.
4646
# Install the build and runtime dependencies of the project.
4747
- "%CMD_IN_ENV% pip install --timeout=60 --trusted-host 28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com -r continuous_integration/appveyor/requirements.txt"
48-
- "%CMD_IN_ENV% python setup.py bdist_wheel bdist_wininst "
48+
- "%CMD_IN_ENV% python setup.py bdist_wheel bdist_wininst"
4949
- ps: "ls dist"
5050

5151
# Install the genreated wheel package to test it
@@ -59,7 +59,7 @@ test_script:
5959
# installed library.
6060
- "mkdir empty_folder"
6161
- "cd empty_folder"
62-
- "pip install pyemd testfixtures"
62+
- "pip install pyemd testfixtures unittest2"
6363

6464
- "python -c \"import nose; nose.main()\" -s -v gensim"
6565
# Move back to the project folder
@@ -86,4 +86,3 @@ cache:
8686
# container, speed up the appveyor jobs and reduce bandwidth
8787
# usage on our rackspace account.
8888
- '%APPDATA%\pip\Cache'
89-

docs/notebooks/Word2Vec_FastText_Comparison.ipynb

+2
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,8 @@
459459
"cell_type": "markdown",
460460
"metadata": {},
461461
"source": [
462+
"The `accuracy` takes an optional parameter `restrict_vocab`, which limits the vocabulary of model considered for fast approximate evaluation (default is 30000).\n",
463+
"\n",
462464
"Word2Vec embeddings seem to be slightly better than fastText embeddings at the semantic tasks, while the fastText embeddings do significantly better on the syntactic analogies. Makes sense, since fastText embeddings are trained for understanding morphological nuances, and most of the syntactic analogies are morphology based. \n",
463465
"\n",
464466
"Let me explain that better.\n",

docs/notebooks/datasets/mycorpus.txt

+9-106
Original file line numberDiff line numberDiff line change
@@ -1,106 +1,9 @@
1-
<!DOCTYPE html>
2-
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
3-
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
4-
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
5-
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
6-
<head>
7-
<title>Attention Required! | CloudFlare</title>
8-
<meta charset="UTF-8" />
9-
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
10-
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
11-
<meta name="robots" content="noindex, nofollow" />
12-
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
13-
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
14-
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
15-
<style type="text/css">body{margin:0;padding:0}</style>
16-
<!--[if lte IE 9]><script type="text/javascript" src="/cdn-cgi/scripts/jquery.min.js"></script><![endif]-->
17-
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
18-
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
19-
20-
21-
</head>
22-
<body>
23-
<div id="cf-wrapper">
24-
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
25-
<div id="cf-error-details" class="cf-error-details-wrapper">
26-
<div class="cf-wrapper cf-header cf-error-overview">
27-
<h1 data-translate="challenge_headline">One more step</h1>
28-
<h2 class="cf-subheadline"><span data-translate="complete_sec_check">Please complete the security check to access</span> radimrehurek.com</h2>
29-
</div><!-- /.header -->
30-
31-
<div class="cf-section cf-highlight cf-captcha-container">
32-
<div class="cf-wrapper">
33-
<div class="cf-columns two">
34-
<div class="cf-column">
35-
<div class="cf-highlight-inverse cf-form-stacked">
36-
<form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">
37-
<script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal" data-ray="2e6e057767f92eff" async data-sitekey="6LfOYgoTAAAAAInWDVTLSc8Yibqp-c9DaLimzNGM" data-stoken="fl5gc_M14MlvBWkagabZ26h9QmYXjG-MwP3RCYKAYmws6uvoh8HXypNV4EIs6bWsA3DnLB2JtDqrqDW3zZUAHxQSSST-szj1FSE9yoiQTqw"></script>
38-
<div class="g-recaptcha"></div>
39-
<noscript id="cf-captcha-bookmark" class="cf-captcha-info">
40-
<div><div style="width: 302px">
41-
<div>
42-
<iframe src="https://www.google.com/recaptcha/api/fallback?k=6LfOYgoTAAAAAInWDVTLSc8Yibqp-c9DaLimzNGM&stoken=fl5gc_M14MlvBWkagabZ26h9QmYXjG-MwP3RCYKAYmws6uvoh8HXypNV4EIs6bWsA3DnLB2JtDqrqDW3zZUAHxQSSST-szj1FSE9yoiQTqw" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>
43-
</div>
44-
<div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">
45-
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>
46-
<input type="submit" value="Submit"></input>
47-
</div>
48-
</div></div>
49-
</noscript>
50-
</form>
51-
52-
</div>
53-
</div>
54-
55-
<div class="cf-column">
56-
<div class="cf-screenshot-container">
57-
58-
<span class="cf-no-screenshot"></span>
59-
60-
</div>
61-
</div>
62-
</div><!-- /.columns -->
63-
</div>
64-
</div><!-- /.captcha-container -->
65-
66-
<div class="cf-section cf-wrapper">
67-
<div class="cf-columns two">
68-
<div class="cf-column">
69-
<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>
70-
71-
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
72-
</div>
73-
74-
<div class="cf-column">
75-
<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>
76-
77-
<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>
78-
79-
<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>
80-
</div>
81-
</div>
82-
</div><!-- /.section -->
83-
84-
<div class="cf-error-footer cf-wrapper">
85-
<p>
86-
<span class="cf-footer-item">CloudFlare Ray ID: <strong>2e6e057767f92eff</strong></span>
87-
<span class="cf-footer-separator">&bull;</span>
88-
<span class="cf-footer-item"><span data-translate="your_ip">Your IP</span>: 202.41.10.3</span>
89-
<span class="cf-footer-separator">&bull;</span>
90-
<span class="cf-footer-item"><span data-translate="performance_security_by">Performance &amp; security by</span> <a data-orig-proto="https" data-orig-ref="www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">CloudFlare</a></span>
91-
92-
</p>
93-
</div><!-- /.error-footer -->
94-
95-
96-
</div><!-- /#cf-error-details -->
97-
</div><!-- /#cf-wrapper -->
98-
99-
<script type="text/javascript">
100-
window._cf_translation = {};
101-
102-
103-
</script>
104-
105-
</body>
106-
</html>
1+
Human machine interface for lab abc computer applications
2+
A survey of user opinion of computer system response time
3+
The EPS user interface management system
4+
System and human system engineering testing of EPS
5+
Relation of user perceived response time to error measurement
6+
The generation of random binary unordered trees
7+
The intersection graph of paths in trees
8+
Graph minors IV Widths of trees and well quasi ordering
9+
Graph minors A survey

0 commit comments

Comments
 (0)