Skip to content

🐛 [Bug]: New install - response keeps repeating the last line #1182

@DeadEnded

Description

@DeadEnded

Bug description

I just pulled the image, spun up a container with default settings. I downloaded the Mistral-7B model, and left everything default. I've tried a few short questions, and the answer repeats the last line until I stop the container.

Steps to reproduce

  1. Spin up new container with default settings (from repo)
  2. Download Mistral-7B
  3. Start a new chat and ask "what is the square root of nine"

Environment Information

Docker version: 25.0.3
OS: Ubuntu 22.04.4 LTS on kernel 5.15.0-97
CPU: AMD Ryzen 5 2400G
Broswer: Firefox version 123.0

Screenshots

image

Relevant log output

llm_load_print_meta: BOS token        = 1 '<s>'

llm_load_print_meta: EOS token        = 2 '</s>'

llm_load_print_meta: UNK token        = 0 '<unk>'

llm_load_print_meta: LF token         = 13 '<0x0A>'

llm_load_tensors: ggml ctx size =    0.11 MiB

llm_load_tensors: offloading 0 repeating layers to GPU

llm_load_tensors: offloaded 0/33 layers to GPU

llm_load_tensors:        CPU buffer size =  4165.37 MiB

...............................................................................................

llama_new_context_with_model: n_ctx      = 2153

llama_new_context_with_model: freq_base  = 10000.0

llama_new_context_with_model: freq_scale = 1

llama_kv_cache_init:        CPU KV buffer size =   269.13 MiB

llama_new_context_with_model: KV self size  =  269.12 MiB, K (f16):  134.56 MiB, V (f16):  134.56 MiB

llama_new_context_with_model:        CPU input buffer size   =    12.22 MiB

llama_new_context_with_model:        CPU compute buffer size =   174.42 MiB

llama_new_context_with_model: graph splits (measure): 1

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 

Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'mistralai_mistral-7b-v0.1', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '15'}

18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...

Received termination signal!

++ _term

++ echo 'Received termination signal!'

++ kill -TERM 18

++ kill -TERM 19

18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...

18:signal-handler (1709671894) Received SIGTERM scheduling shutdown...

Confirmations

  • I'm running the latest version of the main branch.
  • I checked existing issues to see if this has already been described.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions