[2026-05-14 19:15:18,925] [WARNING] [torchao] Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.9.1+cu128).
#@@ #@@ @@# @@#
@@ @@ @@ @@ =@@# @@ #@ =@@#.
@@ #@@@@@@@@@ @@ #@#@= @@ #@ .=@@
#@@@@@@@@@@@@@@@@@ =@# @# ##= ## =####=+ @@ =#####+ =#@@###. @@
@@@@@@@@@@/ +@@/ +@@ #@ =@= #@= @@ =@#+ +#@# @@ =@#+ +#@# #@. @@
@@@@@@@@@@ ##@@ ##@@ =@# @# =@# @# @@ @@ @@ @@ #@ #@ @@
@@@@@@@@@@@@@@@@@@@@ #@=+++#@= =@@# @@ @@ @@ @@ #@ #@ @@
=@#=====@@ =@# @# @@ @@ @@ @@ #@ #@ @@
@@@@@@@@@@@@@@@@ @@@@ #@ #@= #@= +@@ #@# =@# @@. =@# =@# #@. @@
=@# @# #@= #@ =#@@@@#= +#@@= +#@@@@#= .##@@+ @@
@@@@ @@@@@@@@@@@@@@@@
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2026-05-14 19:15:25,538] [WARNING] [torchao] Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.9.1+cu128).
[2026-05-14 19:15:27,098] [INFO] [axolotl.integrations.base] Attempting to load plugin: axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
[2026-05-14 19:15:27,591] [INFO] [axolotl.integrations.base] Plugin loaded successfully: axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.config] Auto-enabling LoRA kernel optimizations for faster training. Please explicitly set `lora_*_kernel` config values to `false` to disable. See https://docs.axolotl.ai/docs/lora_optims.html for more info.
[2026-05-14 19:15:27,620] [INFO] [axolotl.utils.schemas.validation] explicitly setting `eval_sample_packing` to match `sample_packing`
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.validation] `pad_to_sequence_len: true` is recommended when using sample_packing
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.validation] `sample_packing` with `attn_implementation='sdpa'` does not handle cross-sample decontamination. Use a varlen-capable backend (e.g. flash_attention_2, flex_attention, xformers, sage) to isolate samples.
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.config] sample_packing & torch sdpa with bf16 is unsupported may results in 0.0 loss. This may work on H100s.
[2026-05-14 19:15:27,939] [INFO] [axolotl.cli.config] config:
{
"activation_offloading": false,
"adapter": "qlora",
"attn_implementation": "sdpa",
"attn_needs_dtype_cast": false,
"attn_supports_packing": false,
"attn_uses_flash_lib": false,
"axolotl_config_path": "./configs/jacob/mre.yaml",
"base_model": "google/gemma-4-E2B-it",
"base_model_config": "google/gemma-4-E2B-it",
"batch_size": 8,
"bf16": true,
"capabilities": {
"bf16": true,
"compute_capability": "sm_89",
"fp8": false,
"n_gpu": 1,
"n_node": 1,
"tf32": true
},
"chat_template": "gemma4",
"context_parallel_size": 1,
"cut_cross_entropy": true,
"dataloader_num_workers": 1,
"dataloader_pin_memory": true,
"dataloader_prefetch_factor": 256,
"dataset_num_proc": 16,
"datasets": [
{
"chat_template": "tokenizer_default",
"message_property_mappings": {
"content": "content",
"role": "role"
},
"path": "mags0ft/gsm8k-chatml",
"split": "train",
"trust_remote_code": false,
"type": "chat_template"
}
],
"ddp": false,
"device": "cuda:0",
"dion_rank_fraction": 1.0,
"dion_rank_multiple_of": 1,
"eaft_alpha": 1.0,
"eaft_k": 20,
"env_capabilities": {
"torch_version": "2.9.1"
},
"eval_batch_size": 1,
"eval_causal_lm_metrics": [
"sacrebleu",
"comet",
"ter",
"chrf"
],
"eval_max_new_tokens": 128,
"eval_sample_packing": true,
"eval_table_size": 0,
"experimental_skip_move_to_device": true,
"fp16": false,
"freeze_mm_modules": true,
"generate_samples": false,
"generation_do_sample": true,
"generation_max_new_tokens": 50,
"generation_prompt_ratio": 0.5,
"generation_temperature": 0.7,
"gradient_accumulation_steps": 8,
"gradient_checkpointing": true,
"gradient_checkpointing_kwargs": {
"use_reentrant": false
},
"include_tkps": true,
"is_multimodal": true,
"layer_offloading": false,
"learning_rate": 0.0002,
"lisa_layers_attribute": "model.layers",
"load_best_model_at_end": false,
"load_in_4bit": true,
"load_in_8bit": false,
"local_rank": 0,
"logging_steps": 1,
"lora_alpha": 16,
"lora_dropout": 0.0,
"lora_embedding_kernel": true,
"lora_mlp_kernel": true,
"lora_o_kernel": true,
"lora_qkv_kernel": true,
"lora_r": 8,
"lora_target_modules": "model.language_model.layers.[\\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj",
"loraplus_lr_embedding": 1e-06,
"lr_scheduler": "cosine",
"mean_resizing_embeddings": false,
"merge_method": "memory_efficient",
"micro_batch_size": 1,
"model_config_type": "gemma4",
"model_config_type_text": "gemma4_text",
"num_epochs": 1.0,
"num_generation_samples": 3,
"optimizer": "adamw_torch_8bit",
"otel_metrics_host": "localhost",
"otel_metrics_port": 8000,
"output_dir": "./outputs/mre",
"pad_to_sequence_len": false,
"plugins": [
"axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin"
],
"pretrain_multipack_attn": true,
"processor_config": "google/gemma-4-E2B-it",
"processor_type": "AutoProcessor",
"profiler_steps_start": 0,
"qlora_sharded_model_loading": false,
"quantize_moe_experts": false,
"ray_num_workers": 1,
"relora_prune_method": "magnitude",
"remove_unused_columns": true,
"resources_per_worker": {
"GPU": 1
},
"sample_packing": true,
"sample_packing_bin_size": 200,
"sample_packing_group_size": 100000,
"save_only_model": false,
"save_safetensors": true,
"sequence_len": 2048,
"shuffle_before_merging_datasets": false,
"shuffle_merged_datasets": true,
"skip_prepare_dataset": false,
"streaming_multipack_buffer_size": 10000,
"strict": false,
"tensor_parallel_size": 1,
"tf32": false,
"tiled_mlp_use_original_mlp": true,
"tokenizer_config": "google/gemma-4-E2B-it",
"tokenizer_save_jinja_files": true,
"torch_dtype": "torch.bfloat16",
"train_on_inputs": false,
"trl": {
"async_prefetch": false,
"log_completions": false,
"mask_truncated_completions": false,
"ref_model_mixup_alpha": 0.9,
"ref_model_sync_steps": 64,
"replay_buffer_size": 0,
"replay_recompute_logps": true,
"reroll_max_groups": 1,
"reroll_start_fraction": 1.0,
"reward_num_workers": 1,
"scale_rewards": true,
"skip_zero_advantage_batches": true,
"sync_ref_model": false,
"use_data_producer": false,
"use_vllm": false,
"vllm_lora_sync": false,
"vllm_server_host": "0.0.0.0",
"vllm_server_port": 8000
},
"use_otel_metrics": false,
"use_ray": false,
"val_set_size": 0.0,
"vllm": {
"device": "auto",
"dtype": "auto",
"gpu_memory_utilization": 0.9,
"host": "0.0.0.0",
"port": 8000
},
"warmup_ratio": 0.1,
"weight_decay": 0.0,
"world_size": 1
}
[2026-05-14 19:15:35,875] [INFO] [axolotl.utils.data.shared] Unable to find prepared dataset in last_run_prepared/c0dccb9bbef493730af01c13797fb326
[2026-05-14 19:15:35,875] [INFO] [axolotl.utils.data.sft] Loading raw datasets...
[2026-05-14 19:15:35,875] [WARNING] [axolotl.utils.data.sft] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`.
Fetching 0 files: 0it [00:00, ?it/s].00B [00:00, ?B/s]
Download complete: : 0.00B [00:00, ?B/s]
[2026-05-14 19:15:37,572] [INFO] [axolotl.utils.data.wrappers] Loading dataset: mags0ft/gsm8k-chatml with base_type: chat_template and prompt_style: None
[2026-05-14 19:15:37,573] [INFO] [axolotl.prompt_strategies.chat_template] Using chat template:
---
{%- macro format_parameters(properties, required) -%}
{%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
{%- set ns = namespace(found_first=false) -%}
{%- for key, value in properties | dictsort -%}
{%- set add_comma = false -%}
{%- if key not in standard_keys -%}
{%- if ns.found_first %},{% endif -%}
{%- set ns.found_first = true -%}
{{ key }}:{
{%- if value['description'] -%}
description:<|"|>{{ value['description'] }}<|"|>
{%- set add_comma = true -%}
{%- endif -%}
{%- if value['nullable'] %}
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
nullable:true
{%- endif -%}
{%- if value['type'] | upper == 'STRING' -%}
{%- if value['enum'] -%}
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
enum:{{ format_argument(value['enum']) }}
{%- endif -%}
{%- elif value['type'] | upper == 'OBJECT' -%}
,properties:{
{%- if value['properties'] is defined and value['properties'] is mapping -%}
{{- format_parameters(value['properties'], value['required'] | default([])) -}}
{%- elif value is mapping -%}
{{- format_parameters(value, value['required'] | default([])) -}}
{%- endif -%}
}
{%- if value['required'] -%}
,required:[
{%- for item in value['required'] | default([]) -%}
<|"|>{{- item -}}<|"|>
{%- if not loop.last %},{% endif -%}
{%- endfor -%}
]
{%- endif -%}
{%- elif value['type'] | upper == 'ARRAY' -%}
{%- if value['items'] is mapping and value['items'] -%}
,items:{
{%- set ns_items = namespace(found_first=false) -%}
{%- for item_key, item_value in value['items'] | dictsort -%}
{%- if item_value is not none -%}
{%- if ns_items.found_first %},{% endif -%}
{%- set ns_items.found_first = true -%}
{%- if item_key == 'properties' -%}
properties:{
{%- if item_value is mapping -%}
{{- format_parameters(item_value, value['items']['required'] | default([])) -}}
{%- endif -%}
}
{%- elif item_key == 'required' -%}
required:[
{%- for req_item in item_value -%}
<|"|>{{- req_item -}}<|"|>
{%- if not loop.last %},{% endif -%}
{%- endfor -%}
]
{%- elif item_key == 'type' -%}
{%- if item_value is string -%}
type:{{ format_argument(item_value | upper) }}
{%- else -%}
type:{{ format_argument(item_value | map('upper') | list) }}
{%- endif -%}
{%- else -%}
{{ item_key }}:{{ format_argument(item_value) }}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
}
{%- endif -%}
{%- endif -%}
{%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
type:<|"|>{{ value['type'] | upper }}<|"|>}
{%- endif -%}
{%- endfor -%}
{%- endmacro -%}
{%- macro format_function_declaration(tool_data) -%}
declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
{%- set params = tool_data['function']['parameters'] -%}
{%- if params -%}
,parameters:{
{%- if params['properties'] -%}
properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
{%- endif -%}
{%- if params['required'] -%}
required:[
{%- for item in params['required'] -%}
<|"|>{{- item -}}<|"|>
{{- ',' if not loop.last -}}
{%- endfor -%}
],
{%- endif -%}
{%- if params['type'] -%}
type:<|"|>{{- params['type'] | upper -}}<|"|>}
{%- endif -%}
{%- endif -%}
{%- if 'response' in tool_data['function'] -%}
{%- set response_declaration = tool_data['function']['response'] -%}
,response:{
{%- if response_declaration['description'] -%}
description:<|"|>{{- response_declaration['description'] -}}<|"|>,
{%- endif -%}
{%- if response_declaration['type'] | upper == 'OBJECT' -%}
type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
{%- endif -%}
{%- endif -%}
}
{%- endmacro -%}
{%- macro format_argument(argument, escape_keys=True) -%}
{%- if argument is string -%}
{{- '<|"|>' + argument + '<|"|>' -}}
{%- elif argument is boolean -%}
{{- 'true' if argument else 'false' -}}
{%- elif argument is mapping -%}
{{- '{' -}}
{%- set ns = namespace(found_first=false) -%}
{%- for key, value in argument | dictsort -%}
{%- if ns.found_first %},{% endif -%}
{%- set ns.found_first = true -%}
{%- if escape_keys -%}
{{- '<|"|>' + key + '<|"|>' -}}
{%- else -%}
{{- key -}}
{%- endif -%}
:{{- format_argument(value, escape_keys=escape_keys) -}}
{%- endfor -%}
{{- '}' -}}
{%- elif argument is sequence -%}
{{- '[' -}}
{%- for item in argument -%}
{{- format_argument(item, escape_keys=escape_keys) -}}
{%- if not loop.last %},{% endif -%}
{%- endfor -%}
{{- ']' -}}
{%- else -%}
{{- argument -}}
{%- endif -%}
{%- endmacro -%}
{#- Removes '<|channel>...<channel|>' thinking blocks from model output.
Splits on the end token '<channel|>', then checks each part for the start
token '<|channel>' and keeps only the text before it. -#}
{%- macro strip_thinking(text) -%}
{%- set ns = namespace(cleaned='') -%}
{%- for part in text.split('<channel|>') -%}
{%- if '<|channel>' in part -%}
{%- set ns.cleaned = ns.cleaned + part.split('<|channel>')[0] -%}
{%- else -%}
{%- set ns.cleaned = ns.cleaned + part -%}
{%- endif -%}
{%- endfor -%}
{{- ns.cleaned | trim -}}
{%- endmacro -%}
{%- set ns = namespace(prev_message_type=None) -%}
{%- set loop_messages = messages -%}
{{ bos_token }}
{#- Handle System/Tool Definitions Block -#}
{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
{{- '<|turn>system\n' -}}
{#- Inject Thinking token at the very top of the FIRST system turn -#}
{%- if enable_thinking is defined and enable_thinking -%}
{{- '<|think|>' -}}
{%- set ns.prev_message_type = 'think' -%}
{%- endif -%}
{%- if messages[0]['role'] in ['system', 'developer'] -%}
{{- messages[0]['content'] | trim -}}
{%- set loop_messages = messages[1:] -%}
{%- endif -%}
{%- if tools -%}
{%- for tool in tools %}
{{- '<|tool>' -}}
{{- format_function_declaration(tool) | trim -}}
{{- '<tool|>' -}}
{%- endfor %}
{%- set ns.prev_message_type = 'tool' -%}
{%- endif -%}
{{- '<turn|>\n' -}}
{%- endif %}
{#- Loop through messages -#}
{%- for message in loop_messages -%}
{#- Reset so only special message types (tool_call, image, etc.) influence
the generation prompt formatting below. Plain text leaves it as None. -#}
{%- set ns.prev_message_type = None -%}
{%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
{{- '<|turn>' + role + '\n' }}
{%- if message['tool_calls'] -%}
{%- for tool_call in message['tool_calls'] -%}
{%- set function = tool_call['function'] -%}
{{- '<|tool_call>call:' + function['name'] + '{' -}}
{%- if function['arguments'] is mapping -%}
{%- set ns_args = namespace(found_first=false) -%}
{%- for key, value in function['arguments'] | dictsort -%}
{%- if ns_args.found_first %},{% endif -%}
{%- set ns_args.found_first = true -%}
{{- key -}}:{{- format_argument(value, escape_keys=False) -}}
{%- endfor -%}
{%- elif function['arguments'] is string -%}
{{- function['arguments'] -}}
{%- endif -%}
{{- '}<tool_call|>' -}}
{%- endfor -%}
{%- set ns.prev_message_type = 'tool_call' -%}
{%- endif -%}
{%- if message['tool_responses'] -%}
{#- Tool Response handling -#}
{%- for tool_response in message['tool_responses'] -%}
{{- '<|tool_response>' -}}
{%- if tool_response['response'] is mapping -%}
{{- 'response:' + tool_response['name'] | default('unknown') + '{' -}}
{%- for key, value in tool_response['response'] | dictsort -%}
{{- key -}}:{{- format_argument(value, escape_keys=False) -}}
{%- if not loop.last %},{% endif -%}
{%- endfor -%}
{{- '}' -}}
{%- else -%}
{{- 'response:' + tool_response['name'] | default('unknown') + '{value:' + format_argument(tool_response['response'], escape_keys=False) + '}' -}}
{%- endif -%}
{{- '<tool_response|>' -}}
{%- endfor -%}
{%- set ns.prev_message_type = 'tool_response' -%}
{%- endif -%}
{%- if message['content'] is string -%}
{%- if role == 'model' -%}
{{- strip_thinking(message['content']) -}}
{%- else -%}
{{- message['content'] | trim -}}
{%- endif -%}
{%- elif message['content'] is sequence -%}
{%- for item in message['content'] -%}
{%- if item['type'] == 'text' -%}
{%- if role == 'model' -%}
{{- strip_thinking(item['text']) -}}
{%- else -%}
{{- item['text'] | trim -}}
{%- endif -%}
{%- elif item['type'] == 'image' -%}
{{- '\n\n<|image|>\n\n' -}}
{%- set ns.prev_message_type = 'image' -%}
{%- elif item['type'] == 'audio' -%}
{{- '<|audio|>' -}}
{%- set ns.prev_message_type = 'audio' -%}
{%- elif item['type'] == 'video' -%}
{{- '\n\n<|video|>\n\n' -}}
{%- set ns.prev_message_type = 'video' -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- if not (message['tool_responses'] and not message['content']) -%}
{{- '<turn|>\n' -}}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
{%- if ns.prev_message_type != 'tool_response' -%}
{{- '<|turn>model\n' -}}
{%- endif -%}
{%- if not enable_thinking | default(false) -%}
{{- '<|channel>thought\n<channel|>' -}}
{%- endif -%}
{%- endif -%}
---
[2026-05-14 19:15:37,615] [WARNING] [axolotl.prompt_strategies.chat_template] EOS token '<eos>' not found in chat_template. Please check if your template/EOS token is correct.
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:00<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:03<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:05<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:06<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:08<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:10<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:12<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:14<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:16<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:18<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:20<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:22<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:24<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:26<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:28<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:30<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16): 0%| | 0/7473 [00:32<?, ? examples/s]
[2026-05-14 19:16:12,319] [ERROR] [axolotl.telemetry.errors] Error captured in telemetry. Run ID: 4cf03c54-f159-47a4-8564-45720f596c09
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/workspace/axolotl-venv/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 585, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3998, in _map_single
for i, batch in iter_outputs(shard_iterable):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3951, in iter_outputs
yield i, apply_function(example, i, offset=offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3872, in apply_function
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 418, in tokenize_prompt
tokenized_prompt = self._tokenize_single_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 520, in _tokenize_single_prompt
turn_start_idx, turn_end_idx = self.find_turn(
^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 725, in find_turn
if dummy_ids[i] != full_ids[i]:
~~~~~~~~~^^^
KeyError: 0
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/workspace/axolotl/src/axolotl/cli/train.py", line 145, in <module>
fire.Fire(do_cli)
File "/workspace/axolotl-venv/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/cli/train.py", line 96, in do_cli
do_train(parsed_cfg, parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 48, in do_train
dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/telemetry/errors.py", line 127, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/common/datasets.py", line 61, in load_datasets
train_dataset, eval_dataset, total_num_steps, prompters = prepare_datasets(
^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/utils.py", line 50, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 65, in prepare_datasets
return _prepare_standard_dataset(cfg, tokenizer, processor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 98, in _prepare_standard_dataset
train_dataset, eval_dataset, prompters = loader.load(_load_datasets)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/lock.py", line 38, in load
result = load_fn()
^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 77, in _load_datasets
train_dataset, eval_dataset, prompters = _load_and_prepare_datasets(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 496, in _load_and_prepare_datasets
dataset, prompters = _load_tokenized_prepared_datasets(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 299, in _load_tokenized_prepared_datasets
dataset, prompters = _load_raw_datasets(
^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 331, in _load_raw_datasets
dataset_wrapper, dataset_prompter = _load_and_process_single_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 411, in _load_and_process_single_dataset
dataset_wrapper, dataset_prompter = get_dataset_wrapper(
^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/wrappers.py", line 123, in get_dataset_wrapper
return _handle_loaded_strategy(dataset_strategy, dataset, dataset_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/utils/data/wrappers.py", line 223, in _handle_loaded_strategy
dataset_wrapper = wrap_dataset_for_tokenized_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/datasets.py", line 87, in wrap_dataset_for_tokenized_prompt
return TokenizedPromptDataset(prompt_tokenizer, dataset, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/datasets.py", line 40, in __init__
self.process(dataset).data,
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl/src/axolotl/datasets.py", line 62, in process
return dataset.map(
^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 575, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3624, in map
for rank, done, content in iflatmap_unordered(
^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 624, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/multiprocess/pool.py", line 774, in get
raise self._value
KeyError: 0
Traceback (most recent call last):
File "/workspace/axolotl-venv/bin/accelerate", line 12, in <module>
sys.exit(main())
^^^^^^
File "/workspace/axolotl-venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
args.func(args)
File "/workspace/axolotl-venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1405, in launch_command
simple_launcher(args)
File "/workspace/axolotl-venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 993, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/axolotl-venv/bin/python', '-m', 'axolotl.cli.train', './configs/jacob/mre.yaml', '--debug=False', '--debug-text-only=False', '--debug-num-examples=0', '--shard=False']' returned non-zero exit status 1.
Please check that this issue hasn't been reported before.
Expected Behavior
Using the Axolotl Docker image
axolotlai/axolotl-uv:main-latest, training a model with the attached config is expected to work; however, there seems to be an error with the dataset tokenization / pre-processing. For testing purposes, I made this conversational dataset on Hugging Face to train on.What I thought would happen is that the pre-processing completes without any problems and the training process starts. Unfortunately, this does not happen.
Current behaviour
Axolotl promptly crashes before starting the training at the dataset pre-processing state. It's throwing a KeyError. I don't quite understand where it's coming from, but I believe it is dataset-related...
This is the most relevant part of the error message in short:
Here is the full log.
Steps to reproduce
Config yaml
Possible solution
I have no clue, even after digging through the code from the traceback, unfortunately :(
Which Operating Systems are you using?
Python Version
3.12.13
axolotl branch-commit
main/d7cb1c9d03dace6e3db34ed5b8ee908eae671a7a
Acknowledgements