Skip to content

[CUDA 11.2] Experiment on RTX 3090 with faiss-gpu 1.6.5 #36

@GaoKangYu

Description

@GaoKangYu

After installing all dependencies according to the README, I have encountered several errors.

By now most of them have been solved.

If you meet the same error, hope you can find some reference here.

My experiment env-info:

GPU : RTX 3090
CUDA : 11.2
Python : 3.8
Pytorch : 1.8.0+cu111
Scikit-learn : 0.24.1

Error 1

When I use faiss-gpu 1.6.3 under CUDA 10.2, process would be killed sometimes when computing 'jaccard distance'.

Abnormal Memory usage : Process was killed when computing 'jaccard distance'.

Solution

  • Upgrade scikit-learn to 0.20.2+.

  • Change n_jobs=-1 to 2 or 4.

#184

cluster = DBSCAN(eps=eps, min_samples=4, metric='precomputed', n_jobs=4)
cluster_tight = DBSCAN(eps=eps_tight, min_samples=4, metric='precomputed', n_jobs=4)
cluster_loose = DBSCAN(eps=eps_loose, min_samples=4, metric='precomputed', n_jobs=4)

Error 2

Abnormal GPU usage

When I use faiss-gpu 1.6.3 under CUDA 11.2, the model training can be processed but encountered CUDA error soon.

That's because faiss-gpu 1.6.3 is not compatible with CUDA 11.2.

Solution

  • Upgrade faiss-gpu to 1.6.5 by using:
conda install -c conda-forge faiss=1.6.5=py38h60a57df_0_cuda
  • Then I got traceback : "module 'faiss' has no attribute 'cast_integer_to long ptr'", solving that by:

#L15

#replacing "cast_integer_to_long_ptr" by "cast_integer_to_idx_t_ptr"
def swig_ptr_from_LongTensor(x):
    assert x.is_contiguous()
    assert x.dtype == torch.int64, 'dtype=%s' % x.dtype
    # error
    return faiss.cast_integer_to_idx_t_ptr(
        x.storage().data_ptr() + x.storage_offset() * 8)
  • In a word, thanks for yxgeee's great work. 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions