cudaMalloc error out of memory #3642
Replies: 3 comments
-
|
I don't understand what the sample size means. What is the total size of the dataset that must be moved to GPU? If the size exceeds GPU RAM but by less than a factor 2 it may be useful to encode it in 16-bit floats, see https://github.com/facebookresearch/faiss/blob/main/faiss/gpu/GpuClonerOptions.h useFloat16 |
Beta Was this translation helpful? Give feedback.
-
can i use float16 in python version when indexing the dataset to gpu? |
Beta Was this translation helpful? Give feedback.
-
|
@turovapolina , did you find a solution to get around the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
When I am using pytorch-metric-learning package which refers to faiss I am getting an error
Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryBuffer dev 0 space Device stream 0x56211a0ab660 size 1610612736 bytes (cudaMalloc error out of memory [2])Probably this question has been asked several times before but I didn't find any working instructions or explanations for this. I found in the documentation that temporary memory never exceeds 1.5 Gb (https://faiss.ai/cpp_api/class/classfaiss_1_1gpu_1_1StandardGpuResourcesImpl.html#_CPPv4N5faiss3gpu24StandardGpuResourcesImpl13setTempMemoryE6size_t) but I just want to be sure that nothing can be done to extend this or somehow solve this problem.
Platform
OS: macOS, but calculations are performed on Google Colab Pro+
Faiss version: 1.7.1
Installed from: pip
Faiss compilation options:
Running on:
Interface:
Reproduction instructions
I am using pytorch-metric-learning and at the stage of test I am using
accuracy_calculator.get_accuracyfunction (https://kevinmusgrave.github.io/pytorch-metric-learning/accuracy_calculation/). As an architecture I am using ResNet18 with modified lats fc layer (from 1000 to 32).When this function is called I have this error:
Size of samples in my dataset is 750x750 and I understand that this might me too large. When I am compressing them to 500x500 code works well without this error.
However, these samples are not pictures, they represent results of some chemical experiment (spectra) and I am highly interested in the procedure where I am not compressing them because every point represents specific measurement.
I am using Google Colab Pro+ version and my
runtime has 54.8 gigabytes of available RAM.So for me it seems reasonable to extend size of my temporary memory if it is possible at all.
Or probably I misunderstand the whole situation and if it is so please correct me. I will be glad to provide any other details.
Thank you very much in advance!
Best regards,
Polina
Beta Was this translation helpful? Give feedback.
All reactions