Description
Description:
I am trying to install llama-cpp-python with cuda support however i run into build errors. All the information is attached below. I can install it without GPU support just fine.
Environment:
- GPU: Nvidia RTX 5080
- OS: Ubuntu 24.04.2 LTS
- Python: 3.10/3.12 (tried both)
- GCC: 13.3.0
- G++: 13.3.0
- CUDA: 12.9
- Cuda Toolkit: 12.9
Error Log:
`
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [161 lines of output]
*** scikit-build-core 0.11.2 using CMake 3.28.3 (wheel)
*** Configuring CMake...
loading initial cache file /tmp/tmpw194gw2i/build/CMakeInit.txt
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/x86_64-linux-gnu-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/x86_64-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.43.0")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.41")
-- CUDA Toolkit found
-- Using CUDA architectures: native
-- The CUDA compiler identification is NVIDIA 12.9.41
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 13.3.0
-- Including CUDA backend
CMake Warning at vendor/llama.cpp/ggml/CMakeLists.txt:298 (message):
GGML build version fixed at 1 likely due to a shallow clone.
CMake Warning (dev) at CMakeLists.txt:13 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:97 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:21 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:97 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:13 (install):
Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:98 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:21 (install):
Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:98 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Configuring done (6.8s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/tmpw194gw2i/build
*** Building project with Ninja...
Change Dir: '/tmp/tmpw194gw2i/build'
Run Build Command(s): /usr/bin/ninja -v
[1/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-hbm.cpp
[2/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-threading.cpp
[3/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-alloc.c
[4/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-traits.cpp
[5/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp
[6/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/amx/amx.cpp
[7/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-backend.cpp
[8/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-opt.cpp
[9/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-quants.c
[10/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp
[11/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_SHARED -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU -DGGML_USE_CUDA -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-backend-reg.cpp
[12/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml.c
[13/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
FAILED: vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
/usr/include/c++/13/bits/basic_string.h(3163): error: default argument is not allowed
substr(size_type __pos = 0, size_type __n = npos) const
^
detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510
/usr/include/c++/13/bits/basic_string.h(3163): error: expected an expression
substr(size_type __pos = 0, size_type __n = npos) const
^
detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510
/usr/include/c++/13/bits/basic_string.h(3163): error: default argument is not allowed
substr(size_type __pos = 0, size_type __n = npos) const
^
detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510
/usr/include/c++/13/bits/basic_string.h(3163): error: expected an expression
substr(size_type __pos = 0, size_type __n = npos) const
^
detected during instantiation of class "std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT=char32_t, _Traits=std::char_traits<char32_t>, _Alloc=std::allocator<char32_t>]" at line 4510
4 errors detected in the compilation of "/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu".
[14/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
[15/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
FAILED: vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
/usr/bin/x86_64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
during RTL pass: cprop
/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: In function ‘ggml_compute_forward_sub’:
/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:5426:1: internal compiler error: in try_forward_edges, at cfgcleanup.cc:580
5426 | }
| ^
0x108d8f4 internal_error(char const*, ...)
???:0
0x1083cf2 fancy_abort(char const*, int, char const*)
???:0
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <file:///usr/share/doc/gcc-13/README.Bugs> for instructions.
[16/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/diagmask.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
[17/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/acc.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
[18/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/arange.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
[19/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
[20/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/argsort.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
[21/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/count-equal.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
[22/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/clamp.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
[23/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
[24/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/conv-transpose-1d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
[25/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/getrows.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
[26/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/gla.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o
[27/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/concat.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
[28/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
[29/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -march=native -fopenmp -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp
[30/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/out-prod.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o
[31/150] /usr/bin/x86_64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/gguf.cpp
[32/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/opt-step-adamw.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o
[33/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/im2col.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
[34/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
[35/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f32.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
[36/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/cpy.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
[37/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/pad.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o
[38/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/pool2d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o
[39/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/convert.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
[40/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/norm.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o
[41/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/binbcast.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
[42/150] /usr/bin/x86_64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-quants.c
[43/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmv.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o
[44/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
[45/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-wmma-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o
[46/150] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/.. -I/tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/../include -isystem /usr/local/cuda/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 -arch=native -Xcompiler=-fPIC -use_fast_math -compress-mode=size -Xcompiler "-Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -Wno-pedantic" -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o.d -x cu -c /tmp/pip-install-vrjn0p5p/llama-cpp-python_1e2b688442cf40f69f87a94888e22427/vendor/llama.cpp/ggml/src/ggml-cuda/mmvq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o
ninja: build stopped: subcommand failed.
*** CMake build failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
`
Reproduction Steps:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python