Replies: 6 comments 12 replies
-
How can I tell it to use my AMD GPU? |
Beta Was this translation helpful? Give feedback.
-
Wondering if there is a way to change the number of workers and threads, optimizations, etc.? Running on a 32-core hyperthreading system, and wondering if it can be manually optimized to maximize use of all CPU cores for all models. |
Beta Was this translation helpful? Give feedback.
-
In model download window there are no quantized variants to be seen anymore. But there used to be. A bug or a feature? As if the .gguf downloading support is gone... |
Beta Was this translation helpful? Give feedback.
-
Why do I randomly see the app in the background app list, even when I didn't enable the option to do so? (There's no portal description in the entry either) |
Beta Was this translation helpful? Give feedback.
-
How do I set the context size for a local ollama instance? The default is only 2048, when the underlying models support 32K or 64K, but that must be included as a request parameter. |
Beta Was this translation helpful? Give feedback.
-
How to use GPU in Alpaca please?This isn't explained on https://flathub.org/apps/com.jeffser.Alpaca. I ran these commands: flatpak install flathub com.jeffser.Alpaca -y;
flatpak install com.jeffser.Alpaca.Plugins.Ollama -y; And in Flatseal, enabled GPU acceleration for Alpaca. Then ran Alpaca and installed Qwen2.5 coder model. When I ask it a question my CPU usage rises to 50%, and uses 10GB RAM, but the NVIDIA app shows my GPU usage at its normal 20%. Output is extremely slow. So I think it's not using GPU. What should I do please? Here's my Alpaca log file for the instance: print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 14B
print_info: model params = 14.77 B
print_info: general.name = Qwen2.5 Coder 14B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 152064
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151645 '<|im_end|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: CPU model buffer size = 8566.04 MiB
llama_init_from_model: n_seq_max = 4
llama_init_from_model: n_ctx = 8192
llama_init_from_model: n_ctx_per_seq = 2048
llama_init_from_model: n_batch = 2048
llama_init_from_model: n_ubatch = 512
llama_init_from_model: flash_attn = 0
llama_init_from_model: freq_base = 1000000.0
llama_init_from_model: freq_scale = 1
llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 48, can_shift = 1
llama_kv_cache_init: CPU KV buffer size = 1536.00 MiB
llama_init_from_model: KV self size = 1536.00 MiB, K (f16): 768.00 MiB, V (f16): 768.00 MiB
llama_init_from_model: CPU output buffer size = 2.40 MiB
llama_init_from_model: CPU compute buffer size = 696.01 MiB
llama_init_from_model: graph nodes = 1686
llama_init_from_model: graph splits = 1
time=2025-04-26T18:58:23.282+02:00 level=INFO source=server.go:619 msg="llama runner started in 7.03 seconds"
[GIN] 2025/04/26 - 18:58:40 | 200 | 25.040628849s | 127.0.0.1 | POST "/v1/chat/completions"
[GIN] 2025/04/26 - 18:58:47 | 200 | 31.493391109s | 127.0.0.1 | POST "/v1/chat/completions"
[GIN] 2025/04/26 - 19:00:50 | 200 | 822.202µs | 127.0.0.1 | GET "/api/tags" And here's +-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:09:00.0 On | N/A |
| 0% 54C P8 16W / 170W | 827MiB / 12288MiB | 13% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3315 G /usr/lib/xorg/Xorg 241MiB |
| 0 N/A N/A 3539 G /usr/bin/gnome-shell 184MiB |
| 0 N/A N/A 3777 G /usr/bin/ckb-next 2MiB |
| 0 N/A N/A 4417 G /usr/libexec/xdg-desktop-portal-gnome 63MiB |
| 0 N/A N/A 4463 G ...erProcess --variations-seed-version 45MiB |
| 0 N/A N/A 8671 G /app/lib/firefox/firefox 161MiB |
| 0 N/A N/A 283017 G /usr/bin/gnome-system-monitor 25MiB |
| 0 N/A N/A 289563 C+G /usr/bin/gjs 24MiB |
| 0 N/A N/A 289762 G ...erProcess --variations-seed-version 47MiB |
| 0 N/A N/A 295618 G /usr/bin/nvidia-settings 0MiB |
+-----------------------------------------------------------------------------------------+ |
Beta Was this translation helpful? Give feedback.
-
The app doesn't work how it's supposed to 😠
Please report the problem at the issues page.
I want to suggest something! ✨
Open an issue or comment on this discussion
I need help with something 🙏🏻
Comment on this discussion
Beta Was this translation helpful? Give feedback.
All reactions