Bug description
I just pulled the image, spun up a container with default settings. I downloaded the Mistral-7B model, and left everything default. I've tried a few short questions, and the answer repeats the last line until I stop the container.
Steps to reproduce
- Spin up new container with default settings (from repo)
- Download Mistral-7B
- Start a new chat and ask "what is the square root of nine"
Environment Information
Docker version: 25.0.3
OS: Ubuntu 22.04.4 LTS on kernel 5.15.0-97
CPU: AMD Ryzen 5 2400G
Broswer: Firefox version 123.0
Screenshots

Relevant log output
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/33 layers to GPU
llm_load_tensors: CPU buffer size = 4165.37 MiB
...............................................................................................
llama_new_context_with_model: n_ctx = 2153
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 269.13 MiB
llama_new_context_with_model: KV self size = 269.12 MiB, K (f16): 134.56 MiB, V (f16): 134.56 MiB
llama_new_context_with_model: CPU input buffer size = 12.22 MiB
llama_new_context_with_model: CPU compute buffer size = 174.42 MiB
llama_new_context_with_model: graph splits (measure): 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'mistralai_mistral-7b-v0.1', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '15'}
18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...
Received termination signal!
++ _term
++ echo 'Received termination signal!'
++ kill -TERM 18
++ kill -TERM 19
18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...
18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...
Confirmations
Bug description
I just pulled the image, spun up a container with default settings. I downloaded the Mistral-7B model, and left everything default. I've tried a few short questions, and the answer repeats the last line until I stop the container.
Steps to reproduce
Environment Information
Docker version: 25.0.3
OS: Ubuntu 22.04.4 LTS on kernel 5.15.0-97
CPU: AMD Ryzen 5 2400G
Broswer: Firefox version 123.0
Screenshots
Relevant log output
Confirmations