Skip to content

C++: ObjectAllocator_Destructor: Assertion `allocator->nb_inuse == 0' failed #263

@fortminors

Description

@fortminors

Hello! I am trying to profile my cuda program, however it results in assertion errors.
I have created a minimal reproducing example below:

#include <iostream>

#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_fp16.h>

#include "utils/Remotery.h"

int main()
{
    CUcontext* context = nullptr;
    // cuCtxCreate(context, 0, 0);
    cuCtxGetCurrent(context);

    Remotery* rmt;
    rmt_CreateGlobalInstance(&rmt);
    rmtCUDABind bind;

    bind.context = (void*)context;
    bind.CtxSetCurrent = (void*)&cuCtxSetCurrent;
    bind.CtxGetCurrent = (void*)&cuCtxGetCurrent;
    bind.EventCreate = (void*)&cuEventCreate;
    bind.EventDestroy = (void*)&cuEventDestroy;
    bind.EventRecord = (void*)&cuEventRecord;
    bind.EventQuery = (void*)&cuEventQuery;
    bind.EventElapsedTime = (void*)&cuEventElapsedTime;
    rmt_BindCUDA(&bind);

    CUstream stream;

    std::cout << "before cpu scoped sample" << std::endl;
    {
        rmt_ScopedCPUSample(ScopedCPUSample, 0);
    }
    std::cout << "after cpu scoped sample" << std::endl;

    std::cout << "before cpu standard sample" << std::endl;
    rmt_BeginCPUSample(StandardCPUSample, 0);
    rmt_EndCPUSample();
    std::cout << "after cpu standard sample" << std::endl;

    std::cout << "before cuda scoped sample" << std::endl;
    {
        rmt_ScopedCUDASample(ScopedCUDASample, stream);
    }
    std::cout << "after cuda scoped sample" << std::endl;

    std::cout << "before cuda standard sample" << std::endl;
    rmt_BeginCUDASample(StandardCUDASample, stream);
    rmt_EndCUDASample(stream);
    std::cout << "after cuda standard sample" << std::endl;

    std::cout << "success" << std::endl;

    rmt_DestroyGlobalInstance(rmt);
}

Building, linking and running the above script results in the following output:

before cpu scoped sample
after cpu scoped sample
before cpu standard sample
after cpu standard sample
before cuda scoped sample
test_program: utils/Remotery.c:2462: ObjectAllocator_Destructor: Assertion `allocator->nb_inuse == 0' failed.

The CPU sampling works perfectly. I would like to make CUDA sampling work as well, any help is appreciated.

I was able to successfully build Remotery after the changes suggested in #262

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions