Skip to content

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

@Jacobsonradical

Description

@Jacobsonradical

Describe the bug
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/run_scoring.py", line 294, in _run_scorer_parallelizable
scoringResults = scorer.prescore(scoringArgs, preserveRatings=not runParallel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/scorer.py", line 301, in prescore
noteScores, userScores, metaScores = self._prescore_notes_and_users(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 554, in _prescore_notes_and_users
) = self._run_stable_matrix_factorization(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 449, in _run_stable_matrix_factorization
return self._run_regular_matrix_factorization(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 424, in _run_regular_matrix_factorization
return self._mfRanker.run_mf(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py", line 560, in run_mf
self._lossModule = NormalizedLoss(
^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/normalized_loss.py", line 108, in init
assert all(ratings[labelCol].values == targets.numpy())
^^^^^^^^^^^^^^^
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
"""

To Reproduce
I run the code in shell:

python3 main.py
--enrollment /root/community-note/enrollment/2024-12-29_20-02.tsv
--notes /root/community-note/note/2024-12-29_20-02.tsv
--ratings /root/community-note/rating/
--status /root/community-note/status/2024-12-29_20-02.tsv
--outdir /root/community-note/notescore
--parallel

Expected behavior
I believe that this is due to normalized_loss.py, line 108
assert all(ratings[labelCol].values == targets.numpy())

I am not sure if I should change it to
assert all(ratings[labelCol].values == targets.cpu().numpy())

Environment

  1. Same venv as in requirement
  2. NVIDIA H100 80GB HBM3 X2
  3. CUDA 12.2
  4. python 3.11.9
  5. Intel(R) Xeon(R) Platinum 8462Y+
  6. 516GB RAM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions