Skip to content
Discussion options

You must be logged in to vote

@sammcj

Also, just in case you'd want (you should) to use the @Thireus quants use something along the lines to compile the ik_llama.cpp:

#!/usr/bin/env bash
cd ik_llama.cpp
#ngpu=$(find /dev/ -name 'nvidia?' | wc -l)
#ngpu=$((ngpu+1))
#ngpu=16
#if [[ ! -z "${ngpu}" ]]; then
#  sed -Ei "s/^#define GGML_CUDA_MAX_DEVICES.+[0-9]+$/#define GGML_CUDA_MAX_DEVICES       ${ngpu}/" ggml/include/ggml-cuda.h
#fi
cmake -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CUDA_ARCHITECTURES="86" \
  -DGGML_CUDA=ON \
  -DGGML_CUDA_FA_ALL_QUANTS=1 \
  -DGGML_SCHED_MAX_COPIES=1 \
  -DGGML_CUDA_IQK_FORCE_BF16=1 \
  -DGGML_MAX_CONTEXTS=2048 \
  -DGGML_VULKAN=OFF \
  -DGGML_CUDA_F16=ON \
  -DGGML_AVX=ON \
  -…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@sammcj
Comment options

@magikRUKKOLA
Comment options

@magikRUKKOLA
Comment options

@magikRUKKOLA
Comment options

Answer selected by sammcj
@magikRUKKOLA
Comment options

@sammcj
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants