Skip to content

HAL server failing with large kernels #514

Open
@ikabadzhov

Description

Version

latest master / 4.0.0

What behaviour are you expecting?

I was reproducing the server-client setup via HAL server as in https://github.com/codeplaysoftware/oneapi-construction-kit/tree/main/examples/hal_cpu_remote_server, then I noticed my big kernels are erroring out on (both of) my RISC-V device(s). I am sure the (both) device(s) have sufficient memory, and in fact the allocation takes place as expected.

What actual behaviour are you seeing?

I am seeing the following from the local client (first lines as expected):

$ HAL_REMOTE_PORT=5906 ./test $((1<<25))
Running on ock cpu
Allocated 128 MB

$ HAL_REMOTE_PORT=5906 ./test $((1<<26))
Running on ock cpu
Allocated 256 MB
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Aborted (core dumped)

and on the RISC-V server, I get a seg fault shortly as: Segmentation fault (core dumped). And after that, attempting to restart the server on the same port, I fail with Unable to start server on requested port 5906, node 127.0.0.1.

On the other hand, empty kernel, or no kernel at all is OK.

What steps are required to reproduce the bug?

To reproduce, on the client side:

#include <sycl/sycl.hpp>

int main(int argc, char **argv) {
  unsigned long long len = 1 << 28;
  if (argc > 1) {
    len = std::stoull(argv[1]);
  }

  sycl::queue queue(sycl::accelerator_selector_v);
  std::cout << "Running on " << queue.get_device().get_info<sycl::info::device::name>() << std::endl;
  float *d_a = sycl::malloc_device<float>(len, queue);
  queue.wait();
  std::cout << "Allocated " << len * sizeof(float) / 1024 / 1024 << " MB" << std::endl;
  queue.parallel_for(sycl::range<1>(len), [=](sycl::id<1> idx) {
    d_a[idx] = idx;
  }).wait();
  return 0;
}

On the server, simply listen on a port as usual.

Minimal test case

No response

Anything else we should know?

No response

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions