Skip to content

[bug] Low-priority tasks will not be blocked. #132

Open
@chaunceyjiang

Description

@chaunceyjiang

When there are two pods at the same node, one pod A is set to nvidia.com/priority: "0", and another pod B is set to nvidia.com/priority: "1".

Then I run the following program in Pod B and find that it can still run. It will not be blocked.

#include <stdio.h>
#include <unistd.h>
const int N = 16;
const int blocksize = 16;

__global__
void hello(char *a, int *b)
{
        a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
        char a[N] = "Hello \0\0\0\0\0\0";
        int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

        char *ad;
        int *bd;
        const int csize = N*sizeof(char);
        const int isize = N*sizeof(int);

        printf("%s", a);

        cudaMalloc( (void**)&ad, csize );
        cudaMalloc( (void**)&bd, isize );
        cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
        cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );

        dim3 dimBlock( blocksize, 1 );
        dim3 dimGrid( 1, 1 );
        hello<<<dimGrid, dimBlock>>>(ad, bd);
        cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
        cudaFree( ad );
        cudaFree( bd );

        printf("%s\n", a);
        sleep(10);
        return 0;
}

Then I also printed the values in the share cache, confirming that the Pod should be blocked.

root@cuda-12-runtime-6d7cb75b56-7xs68:~# ./mmap_read --filename=/usr/local/vgpu/c11a0a04-ade9-461f-994e-e7f5a8e448b8.cache
cachestr= 
  initializedFlag 19920718
  smInitFlag 0
  ownerPid 0
  sem {[0 0 0 0 1 0 0 0 0 0 0 0 128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]}
  num 1
  uuids [uuid=GPU-26a583dd-542e-09bb-5dd1-9cc5bd6eb552               ]
  limit [157286400 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  sm_limit [10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10]
  procnum 0
  utilizationSwitch 1
  recentKernel -1
  priority 1
  procs [
    pid=292, hostpid=624912, used=[               ], monitorused=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], status=1
    pid=216, hostpid=0, used=[               ], monitorused=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], status=1
root@cuda-12-runtime-6d7cb75b56-7xs68:~# time ./hello
[4pdvGPU Msg(292:139722028023808:libvgpu.c:869)]: Initializing.....
[4pdvGPU Warn(292:139722028023808:utils.c:228)]: get default cuda 1 from (null)
[4pdvGPU Msg(292:139722028023808:libvgpu.c:902)]: Initialized
Hello Hello
[4pdvGPU Msg(292:139722028023808:multiprocess_memory_limit.c:477)]: Calling exit handler 292

real	0m11.668s
user	0m0.205s
sys	0m0.330s

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions