Skip to content

Error: nnz is too large during analysis of 80k papers #256

@olegs

Description

@olegs
[2021-04-13 13:20:56,488: INFO/ForkPoolWorker-2] Searching 100000 most cited publications matching COVID-19, Coronavirus, "Corona virus", 2019-nCoV, SARS-CoV, MERS-CoV, "Severe Acute Respiratory Syndrome", "Middle East Respiratory Syndrome"
[2021-04-13 13:20:56,489: INFO/ForkPoolWorker-2] Preferring non review papers
[2021-04-13 13:26:43,831: INFO/ForkPoolWorker-2] Found 81190 publications in the local database
[2021-04-13 13:26:43,833: INFO/ForkPoolWorker-2] Loading publication data
[2021-04-13 13:27:28,473: INFO/ForkPoolWorker-2] Analyzing title and abstract texts
[2021-04-13 13:27:28,475: INFO/ForkPoolWorker-2] Building corpus from 81190 papers
[2021-04-13 13:46:34,962: INFO/ForkPoolWorker-2] Processing texts similarity
[2021-04-13 13:46:57,076: ERROR/ForkPoolWorker-2] Task analyze_search_terms[84c75f9e-c21c-4e93-9408-5b96c15d8eff] raised unexpected: RuntimeError('nnz of the result is too large')
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/celery/app/trace.py", line 650, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/user/pysrc/celery/tasks_main.py", line 38, in analyze_search_terms
    analyzer.analyze_papers(ids, query, task=current_task)
  File "/home/user/pysrc/papers/analyzer.py", line 143, in analyze_papers
    self.texts_similarity = analyze_texts_similarity(
  File "/home/user/pysrc/papers/analysis/text.py", line 62, in analyze_texts_similarity
    cos_similarities = cosine_similarity(corpus_vectors)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/metrics/pairwise.py", line 1175, in cosine_similarity
    K = safe_sparse_dot(X_normalized, Y_normalized.T,
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/sklearn/utils/extmath.py", line 151, in safe_sparse_dot
    ret = a @ b
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/scipy/sparse/base.py", line 560, in __matmul__
    return self.__mul__(other)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/scipy/sparse/base.py", line 480, in __mul__
    return self._mul_sparse_matrix(other)
  File "/home/user/miniconda3/envs/pubtrends/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 505, in _mul_sparse_matrix
    nnz = fn(M, N,
RuntimeError: nnz of the result is too large

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions