Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9

**Description:**
I am trying to install llama-cpp-python with cuda support however i run into build errors. All the information is attached below. I can install it without GPU support just fine.

**Environment:**

- GPU: Nvidia RTX 5080
- OS: Ubuntu 24.04.2 LTS
- Python: 3.10/3.12 (tried both)
- GCC: 13.3.0
- G++: 13.3.0
- CUDA: 12.9
- Cuda Toolkit: 12.9

Error Log:

`
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [161 lines of output]
      *** scikit-build-core 0.11.2 using CMake 3.28.3 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpw194gw2i/build/CMakeInit.txt
      -- The C compiler identification is GNU 13.3.0
      -- The CXX compiler identification is GNU 13.3.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/x86_64-linux-gnu-gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/x86_64-linux-gnu-g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.43.0")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- Including CPU backend
      -- Found OpenMP_C: -fopenmp (found version "4.5")
      -- Found OpenMP_CXX: -fopenmp (found version "4.5")
      -- Found OpenMP: TRUE (found version "4.5")
      -- x86 detected
      -- Adding CPU backend variant ggml-cpu: -march=native
      -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.41")
      -- CUDA Toolkit found
      -- Using CUDA architectures: native
      -- The CUDA compiler identification is NVIDIA 12.9.41
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- CUDA host compiler is GNU 13.3.0

      -- Including CUDA backend
      CMake Warning at vendor/llama.cpp/ggml/CMakeLists.txt:298 (message):
        GGML build version fixed at 1 likely due to a shallow clone.


      CMake Warning (dev) at CMakeLists.txt:13 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      Call Stack (most recent call first):
        CMakeLists.txt:97 (llama_cpp_python_install_target)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:21 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      Call Stack (most recent call first):
        CMakeLists.txt:97 (llama_cpp_python_install_target)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:13 (install):
        Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      Call Stack (most recent call first):
        CMakeLists.txt:98 (llama_cpp_python_install_target)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:21 (install):
        Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      Call Stack (most recent call first):
        CMakeLists.txt:98 (llama_cpp_python_install_target)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Configuring done (6.8s)
      -- Generating done (0.0s)
      -- Build files have been written to: /tmp/tmpw194gw2i/build
      *** Building project with Ninja...
      Change Dir: '/tmp/tmpw194gw2i/build'

      Run Build Command(s): /usr/bin/ninja -v
      [1/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-hbm.cpp
      [2/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-threading.cpp
      [3/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-alloc.c
      [4/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-traits.cpp
      [5/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp
      [6/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/amx/amx.cpp
      [7/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-backend.cpp
      [8/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-opt.cpp
      [9/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-quants.c
      [10/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp
      [11/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_SHARED -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU -DGGML_USE_CUDA -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-backend-reg.cpp
      [12/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml.c
      [13/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
      FAILED: vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
      /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
      /usr/include/c++/13/bits/basic_string.h(3163): error: default argument is not allowed
              substr(size_type __pos = 0, size_type __n = npos) const
                                     ^
                detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

      /usr/include/c++/13/bits/basic_string.h(3163): error: expected an expression
              substr(size_type __pos = 0, size_type __n = npos) const
                                       ^
                detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

      /usr/include/c++/13/bits/basic_string.h(3163): error: default argument is not allowed
              substr(size_type __pos = 0, size_type __n = npos) const
                                                        ^
                detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

      /usr/include/c++/13/bits/basic_string.h(3163): error: expected an expression
              substr(size_type __pos = 0, size_type __n = npos) const
                                                          ^
                detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510

      4 errors detected in the compilation of "/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu".
      [14/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
      [15/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
      FAILED: vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
      /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
      during RTL pass: cprop
      /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: In function ‘ggml_compute_forward_sub’:
      /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:5426:1: internal compiler error: in try_forward_edges, at cfgcleanup.cc:580
       5426 | }
            | ^
      0x108d8f4 internal_error(char const*, ...)
          ???:0
      0x1083cf2 fancy_abort(char const*, int, char const*)
          ???:0
      Please submit a full bug report, with preprocessed source (by using -freport-bug).
      Please include the complete backtrace with any bug report.
      See <file:///usr/share/doc/gcc-13/README.Bugs> for instructions.
      [16/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/diagmask.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
      [17/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/acc.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
      [18/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/arange.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
      [19/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
      [20/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argsort.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
      [21/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/count-equal.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
      [22/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/clamp.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
      [23/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
      [24/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/conv-transpose-1d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
      [25/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/getrows.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
      [26/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/gla.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o
      [27/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/concat.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
      [28/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
      [29/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp
      [30/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/out-prod.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o
      [31/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/gguf.cpp
      [32/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/opt-step-adamw.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o
      [33/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/im2col.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
      [34/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
      [35/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f32.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
      [36/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/cpy.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
      [37/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/pad.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o
      [38/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/pool2d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o
      [39/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/convert.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
      [40/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/norm.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o
      [41/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/binbcast.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
      [42/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-quants.c
      [43/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmv.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o
      [44/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
      [45/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-wmma-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o
      [46/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmvq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o
      ninja: build stopped: subcommand failed.


      *** CMake build failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
`

**Reproduction Steps:**

`CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Can't install with GPU support with Cuda toolkit 12.9 and Cuda 12.9 #2013

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions