-
-
Notifications
You must be signed in to change notification settings - Fork 25
Description
FlutterView implements focusItemsInRect: - caching for linear focus movement is limited as long as this view is on screen.
flutter: The Dart VM service is listening on http://127.0.0.1:50570/Sif_eIL0TQU=/
Failed to associate thumbnails for picked URL file:///private/var/mobile/Containers/Data/Application/EA977386-3E11-46CD-AC80-5AB86E488188/Documents/qwen2.5-0.5b-instruct-q4_k_m.gguf with the Inbox copy file:///private/var/mobile/Containers/Data/Application/3A05364A-3C99-4DA2-8D2D-F7845D900EB8/tmp/com.example.lcpp-Inbox/qwen2.5-0.5b-instruct-q4_k_m.gguf: Error Domain=QLThumbnailErrorDomain Code=102 "(null)" UserInfo={NSUnderlyingError=0x303208c60 {Error Domain=GSLibraryErrorDomain Code=3 "Generation not found" UserInfo={NSDescription=Generation not found}}}
Can't find or decode reasons
Failed to get or decode unavailable reasons
Can't find or decode disabled use cases
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple A12 GPU)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (CPU)
llama_model_load_from_file_impl: using device Metal (Apple A12 GPU) - 1967 MiB free
llama_model_loader: loaded meta data with 26 key-value pairs and 291 tensors from /private/var/mobile/Containers/Data/Application/3A05364A-3C99-4DA2-8D2D-F7845D900EB8/tmp/qwen2.5-0.5b-instruct-q4_k_m.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = qwen2.5-0.5b-instruct
llama_model_loader: - kv 3: general.version str = v0.1
llama_model_loader: - kv 4: general.finetune str = qwen2.5-0.5b-instruct
llama_model_loader: - kv 5: general.size_label str = 630M
llama_model_loader: - kv 6: qwen2.block_count u32 = 24
llama_model_loader: - kv 7: qwen2.context_length u32 = 32768
llama_model_loader: - kv 8: qwen2.embedding_length u32 = 896
llama_model_loader: - kv 9: qwen2.feed_forward_length u32 = 4864
llama_model_loader: - kv 10: qwen2.attention.head_count u32 = 14
llama_model_loader: - kv 11: qwen2.attention.head_count_kv u32 = 2
llama_model_loader: - kv 12: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 13: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 14: general.file_type u32 = 15
llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 25: general.quantization_version u32 = 2
llama_model_loader: - type f32: 121 tensors
llama_model_loader: - type q5_0: 133 tensors
llama_model_loader: - type q8_0: 13 tensors
llama_model_loader: - type q4_K: 12 tensors
llama_model_loader: - type q6_K: 12 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 462.96 MiB (6.16 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 32768
print_info: n_embd = 896
print_info: n_layer = 24
print_info: n_head = 14
print_info: n_head_kv = 2
print_info: n_rot = 64
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 7
print_info: n_embd_k_gqa = 128
print_info: n_embd_v_gqa = 128
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 4864
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 32768
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 1B
print_info: model params = 630.17 M
print_info: general.name = qwen2.5-0.5b-instruct
print_info: vocab type = BPE
print_info: n_vocab = 151936
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151645 '<|im_end|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
make_cpu_buft_list: disabling extra buffer types (i.e. repacking) since a GPU device is available
load_tensors: layer 0 assigned to device Metal, is_swa = 0
load_tensors: layer 1 assigned to device Metal, is_swa = 0
load_tensors: layer 2 assigned to device Metal, is_swa = 0
load_tensors: layer 3 assigned to device Metal, is_swa = 0
load_tensors: layer 4 assigned to device Metal, is_swa = 0
load_tensors: layer 5 assigned to device Metal, is_swa = 0
load_tensors: layer 6 assigned to device Metal, is_swa = 0
load_tensors: layer 7 assigned to device Metal, is_swa = 0
load_tensors: layer 8 assigned to device Metal, is_swa = 0
load_tensors: layer 9 assigned to device Metal, is_swa = 0
load_tensors: layer 10 assigned to device Metal, is_swa = 0
load_tensors: layer 11 assigned to device Metal, is_swa = 0
load_tensors: layer 12 assigned to device Metal, is_swa = 0
load_tensors: layer 13 assigned to device Metal, is_swa = 0
load_tensors: layer 14 assigned to device Metal, is_swa = 0
load_tensors: layer 15 assigned to device Metal, is_swa = 0
load_tensors: layer 16 assigned to device Metal, is_swa = 0
load_tensors: layer 17 assigned to device Metal, is_swa = 0
load_tensors: layer 18 assigned to device Metal, is_swa = 0
load_tensors: layer 19 assigned to device Metal, is_swa = 0
load_tensors: layer 20 assigned to device Metal, is_swa = 0
load_tensors: layer 21 assigned to device Metal, is_swa = 0
load_tensors: layer 22 assigned to device Metal, is_swa = 0
load_tensors: layer 23 assigned to device Metal, is_swa = 0
load_tensors: layer 24 assigned to device Metal, is_swa = 0
load_tensors: tensor 'output.weight' (q8_0) (and 168 others) cannot be used with preferred buffer type Metal, using CPU instead
ggml_backend_metal_log_allocated_size: allocated buffer, size = 235.78 MiB, ( 316.64 / 2048.02)
load_tensors: offloading 24 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 25/25 layers to GPU
load_tensors: CPU_Mapped model buffer size = 462.96 MiB
load_tensors: Metal_Mapped model buffer size = 235.78 MiB
.....................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 2048
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 1000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A12 GPU
ggml_metal_load_library: loading '/var/containers/Bundle/Application/3DE56B4D-3076-45F6-A5F7-30068A5F3403/Runner.app/Frameworks/lcpp.framework/default.metallib'
ggml_metal_init: GPU name: Apple A12 GPU
ggml_metal_init: GPU family: MTLGPUFamilyApple5 (1005)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: simdgroup reduction = false
ggml_metal_init: simdgroup matrix mul. = false
ggml_metal_init: has residency sets = true
ggml_metal_init: has bfloat = false
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 2147.50 MB
ggml_metal_init: loaded kernel_add 0x3018844e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_add_row 0x301881b00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub 0x301884a20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub_row 0x3018850e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul 0x301885aa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_row 0x3018819e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div 0x30188f7e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div_row 0x30188e4c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f32 0x30188fea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f16 0x301898060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i32 0x3018986c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i16 0x301886460 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale 0x301886e20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale_4 0x301887480 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_clamp 0x3018874e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_tanh 0x301887540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_relu 0x301898780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sigmoid 0x301898de0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu 0x301898e40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_4 0x301898ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick 0x301898f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick_4 0x301898f60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu 0x301898fc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu_4 0x301899020 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_elu 0x3018875a0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_soft_max_f16 (not supported)
ggml_metal_init: skipping kernel_soft_max_f16_4 (not supported)
ggml_metal_init: skipping kernel_soft_max_f32 (not supported)
ggml_metal_init: skipping kernel_soft_max_f32_4 (not supported)
ggml_metal_init: loaded kernel_diag_mask_inf 0x301887600 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf_8 0x301899080 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f32 0x3018990e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f16 0x301887660 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: loaded kernel_get_rows_q4_0 0x3018877e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_1 0x30189c060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_0 0x30189c3c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_1 0x30189c720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q8_0 0x30189c7e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q2_K 0x30189cb40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q3_K 0x30189cea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_K 0x30189d200 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_K 0x30189d560 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q6_K 0x30189d8c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs 0x30189dc20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xs 0x301899860 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs 0x3018997a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_s 0x301899980 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_s 0x30189df80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_s 0x30189e2e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_m 0x30189e640 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_nl 0x30189e9a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_xs 0x301899ce0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_i32 0x30189a040 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_rms_norm (not supported)
ggml_metal_init: skipping kernel_l2_norm (not supported)
ggml_metal_init: skipping kernel_group_norm (not supported)
Compiler failed to build request
ggml_metal_init: loaded kernel_norm 0x0 | th_max = 0 | th_width = 0
ggml_metal_init: error: load pipeline error: Error Domain=AGXMetalA12 Code=3 "Encountered unlowered function call to air.simd_sum.f32" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.simd_sum.f32}
ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize the context: failed to initialize Metal backend
Assertion failed: (ctx != nullptr), function llama_prompt, file llm.cpp, line 111.