Skip to content

"KeyError: 0" when trying to fine-tune on text-only conversational dataset during pre-processing step #3655

Description

@mags0ft

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Using the Axolotl Docker image axolotlai/axolotl-uv:main-latest, training a model with the attached config is expected to work; however, there seems to be an error with the dataset tokenization / pre-processing. For testing purposes, I made this conversational dataset on Hugging Face to train on.

What I thought would happen is that the pre-processing completes without any problems and the training process starts. Unfortunately, this does not happen.

Current behaviour

Axolotl promptly crashes before starting the training at the dataset pre-processing state. It's throwing a KeyError. I don't quite understand where it's coming from, but I believe it is dataset-related...

This is the most relevant part of the error message in short:

  File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 418, in tokenize_prompt
    tokenized_prompt = self._tokenize_single_prompt(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 520, in _tokenize_single_prompt
    turn_start_idx, turn_end_idx = self.find_turn(
                                   ^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 725, in find_turn
    if dummy_ids[i] != full_ids[i]:
       ~~~~~~~~~^^^
KeyError: 0
Here is the full log.
[2026-05-14 19:15:18,925] [WARNING] [torchao] Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.9.1+cu128).

     #@@ #@@      @@# @@#
    @@  @@          @@  @@           =@@#                               @@                 #@    =@@#.
    @@    #@@@@@@@@@    @@           #@#@=                              @@                 #@     .=@@
      #@@@@@@@@@@@@@@@@@            =@# @#     ##=     ##    =####=+    @@      =#####+  =#@@###.   @@
    @@@@@@@@@@/  +@@/  +@@          #@  =@=     #@=   @@   =@#+  +#@#   @@    =@#+  +#@#   #@.      @@
    @@@@@@@@@@  ##@@  ##@@         =@#   @#      =@# @#    @@      @@   @@    @@      #@   #@       @@
     @@@@@@@@@@@@@@@@@@@@          #@=+++#@=      =@@#     @@      @@   @@    @@      #@   #@       @@
                                  =@#=====@@     =@# @#    @@      @@   @@    @@      #@   #@       @@
    @@@@@@@@@@@@@@@@  @@@@        #@      #@=   #@=  +@@   #@#    =@#   @@.   =@#    =@#   #@.      @@
                                 =@#       @#  #@=     #@   =#@@@@#=    +#@@=  +#@@@@#=    .##@@+   @@
    @@@@  @@@@@@@@@@@@@@@@

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2026-05-14 19:15:25,538] [WARNING] [torchao] Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.9.1+cu128).
[2026-05-14 19:15:27,098] [INFO] [axolotl.integrations.base] Attempting to load plugin: axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
[2026-05-14 19:15:27,591] [INFO] [axolotl.integrations.base] Plugin loaded successfully: axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.config] Auto-enabling LoRA kernel optimizations for faster training. Please explicitly set `lora_*_kernel` config values to `false` to disable. See https://docs.axolotl.ai/docs/lora_optims.html for more info.
[2026-05-14 19:15:27,620] [INFO] [axolotl.utils.schemas.validation] explicitly setting `eval_sample_packing` to match `sample_packing`
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.validation] `pad_to_sequence_len: true` is recommended when using sample_packing
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.validation] `sample_packing` with `attn_implementation='sdpa'` does not handle cross-sample decontamination. Use a varlen-capable backend (e.g. flash_attention_2, flex_attention, xformers, sage) to isolate samples.
[2026-05-14 19:15:27,620] [WARNING] [axolotl.utils.schemas.config] sample_packing & torch sdpa with bf16 is unsupported may results in 0.0 loss. This may work on H100s.
[2026-05-14 19:15:27,939] [INFO] [axolotl.cli.config] config:
{
  "activation_offloading": false,
  "adapter": "qlora",
  "attn_implementation": "sdpa",
  "attn_needs_dtype_cast": false,
  "attn_supports_packing": false,
  "attn_uses_flash_lib": false,
  "axolotl_config_path": "./configs/jacob/mre.yaml",
  "base_model": "google/gemma-4-E2B-it",
  "base_model_config": "google/gemma-4-E2B-it",
  "batch_size": 8,
  "bf16": true,
  "capabilities": {
    "bf16": true,
    "compute_capability": "sm_89",
    "fp8": false,
    "n_gpu": 1,
    "n_node": 1,
    "tf32": true
  },
  "chat_template": "gemma4",
  "context_parallel_size": 1,
  "cut_cross_entropy": true,
  "dataloader_num_workers": 1,
  "dataloader_pin_memory": true,
  "dataloader_prefetch_factor": 256,
  "dataset_num_proc": 16,
  "datasets": [
    {
      "chat_template": "tokenizer_default",
      "message_property_mappings": {
        "content": "content",
        "role": "role"
      },
      "path": "mags0ft/gsm8k-chatml",
      "split": "train",
      "trust_remote_code": false,
      "type": "chat_template"
    }
  ],
  "ddp": false,
  "device": "cuda:0",
  "dion_rank_fraction": 1.0,
  "dion_rank_multiple_of": 1,
  "eaft_alpha": 1.0,
  "eaft_k": 20,
  "env_capabilities": {
    "torch_version": "2.9.1"
  },
  "eval_batch_size": 1,
  "eval_causal_lm_metrics": [
    "sacrebleu",
    "comet",
    "ter",
    "chrf"
  ],
  "eval_max_new_tokens": 128,
  "eval_sample_packing": true,
  "eval_table_size": 0,
  "experimental_skip_move_to_device": true,
  "fp16": false,
  "freeze_mm_modules": true,
  "generate_samples": false,
  "generation_do_sample": true,
  "generation_max_new_tokens": 50,
  "generation_prompt_ratio": 0.5,
  "generation_temperature": 0.7,
  "gradient_accumulation_steps": 8,
  "gradient_checkpointing": true,
  "gradient_checkpointing_kwargs": {
    "use_reentrant": false
  },
  "include_tkps": true,
  "is_multimodal": true,
  "layer_offloading": false,
  "learning_rate": 0.0002,
  "lisa_layers_attribute": "model.layers",
  "load_best_model_at_end": false,
  "load_in_4bit": true,
  "load_in_8bit": false,
  "local_rank": 0,
  "logging_steps": 1,
  "lora_alpha": 16,
  "lora_dropout": 0.0,
  "lora_embedding_kernel": true,
  "lora_mlp_kernel": true,
  "lora_o_kernel": true,
  "lora_qkv_kernel": true,
  "lora_r": 8,
  "lora_target_modules": "model.language_model.layers.[\\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj",
  "loraplus_lr_embedding": 1e-06,
  "lr_scheduler": "cosine",
  "mean_resizing_embeddings": false,
  "merge_method": "memory_efficient",
  "micro_batch_size": 1,
  "model_config_type": "gemma4",
  "model_config_type_text": "gemma4_text",
  "num_epochs": 1.0,
  "num_generation_samples": 3,
  "optimizer": "adamw_torch_8bit",
  "otel_metrics_host": "localhost",
  "otel_metrics_port": 8000,
  "output_dir": "./outputs/mre",
  "pad_to_sequence_len": false,
  "plugins": [
    "axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin"
  ],
  "pretrain_multipack_attn": true,
  "processor_config": "google/gemma-4-E2B-it",
  "processor_type": "AutoProcessor",
  "profiler_steps_start": 0,
  "qlora_sharded_model_loading": false,
  "quantize_moe_experts": false,
  "ray_num_workers": 1,
  "relora_prune_method": "magnitude",
  "remove_unused_columns": true,
  "resources_per_worker": {
    "GPU": 1
  },
  "sample_packing": true,
  "sample_packing_bin_size": 200,
  "sample_packing_group_size": 100000,
  "save_only_model": false,
  "save_safetensors": true,
  "sequence_len": 2048,
  "shuffle_before_merging_datasets": false,
  "shuffle_merged_datasets": true,
  "skip_prepare_dataset": false,
  "streaming_multipack_buffer_size": 10000,
  "strict": false,
  "tensor_parallel_size": 1,
  "tf32": false,
  "tiled_mlp_use_original_mlp": true,
  "tokenizer_config": "google/gemma-4-E2B-it",
  "tokenizer_save_jinja_files": true,
  "torch_dtype": "torch.bfloat16",
  "train_on_inputs": false,
  "trl": {
    "async_prefetch": false,
    "log_completions": false,
    "mask_truncated_completions": false,
    "ref_model_mixup_alpha": 0.9,
    "ref_model_sync_steps": 64,
    "replay_buffer_size": 0,
    "replay_recompute_logps": true,
    "reroll_max_groups": 1,
    "reroll_start_fraction": 1.0,
    "reward_num_workers": 1,
    "scale_rewards": true,
    "skip_zero_advantage_batches": true,
    "sync_ref_model": false,
    "use_data_producer": false,
    "use_vllm": false,
    "vllm_lora_sync": false,
    "vllm_server_host": "0.0.0.0",
    "vllm_server_port": 8000
  },
  "use_otel_metrics": false,
  "use_ray": false,
  "val_set_size": 0.0,
  "vllm": {
    "device": "auto",
    "dtype": "auto",
    "gpu_memory_utilization": 0.9,
    "host": "0.0.0.0",
    "port": 8000
  },
  "warmup_ratio": 0.1,
  "weight_decay": 0.0,
  "world_size": 1
}
[2026-05-14 19:15:35,875] [INFO] [axolotl.utils.data.shared] Unable to find prepared dataset in last_run_prepared/c0dccb9bbef493730af01c13797fb326
[2026-05-14 19:15:35,875] [INFO] [axolotl.utils.data.sft] Loading raw datasets...
[2026-05-14 19:15:35,875] [WARNING] [axolotl.utils.data.sft] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`.
Fetching 0 files: 0it [00:00, ?it/s].00B [00:00, ?B/s]
Download complete: : 0.00B [00:00, ?B/s]              
[2026-05-14 19:15:37,572] [INFO] [axolotl.utils.data.wrappers] Loading dataset: mags0ft/gsm8k-chatml with base_type: chat_template and prompt_style: None
[2026-05-14 19:15:37,573] [INFO] [axolotl.prompt_strategies.chat_template] Using chat template:
---
{%- macro format_parameters(properties, required) -%}
    {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
    {%- set ns = namespace(found_first=false) -%}
    {%- for key, value in properties | dictsort -%}
        {%- set add_comma = false -%}
        {%- if key not in standard_keys -%}
            {%- if ns.found_first %},{% endif -%}
            {%- set ns.found_first = true -%}
            {{ key }}:{
            {%- if value['description'] -%}
                description:<|"|>{{ value['description'] }}<|"|>
                {%- set add_comma = true -%}
            {%- endif -%}
            {%- if value['nullable'] %}
                {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                nullable:true
            {%- endif -%}
            {%- if value['type'] | upper == 'STRING' -%}
                {%- if value['enum'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    enum:{{ format_argument(value['enum']) }}
                {%- endif -%}
            {%- elif value['type'] | upper == 'OBJECT' -%}
                ,properties:{
                {%- if value['properties'] is defined and value['properties'] is mapping -%}
                    {{- format_parameters(value['properties'], value['required'] | default([])) -}}
                {%- elif value is mapping -%}
                    {{- format_parameters(value, value['required'] | default([])) -}}
                {%- endif -%}
                }
                {%- if value['required'] -%}
                    ,required:[
                    {%- for item in value['required'] | default([]) -%}
                        <|"|>{{- item -}}<|"|>
                        {%- if not loop.last %},{% endif -%}
                    {%- endfor -%}
                    ]
                {%- endif -%}
            {%- elif value['type'] | upper == 'ARRAY' -%}
                {%- if value['items'] is mapping and value['items'] -%}
                    ,items:{
                    {%- set ns_items = namespace(found_first=false) -%}
                    {%- for item_key, item_value in value['items'] | dictsort -%}
                        {%- if item_value is not none -%}
                            {%- if ns_items.found_first %},{% endif -%}
                            {%- set ns_items.found_first = true -%}
                            {%- if item_key == 'properties' -%}
                                properties:{
                                {%- if item_value is mapping -%}
                                    {{- format_parameters(item_value, value['items']['required'] | default([])) -}}
                                {%- endif -%}
                                }
                            {%- elif item_key == 'required' -%}
                                required:[
                                {%- for req_item in item_value -%}
                                    <|"|>{{- req_item -}}<|"|>
                                    {%- if not loop.last %},{% endif -%}
                                {%- endfor -%}
                                ]
                            {%- elif item_key == 'type' -%}
                                {%- if item_value is string -%}
                                    type:{{ format_argument(item_value | upper) }}
                                {%- else -%}
                                    type:{{ format_argument(item_value | map('upper') | list) }}
                                {%- endif -%}
                            {%- else -%}
                                {{ item_key }}:{{ format_argument(item_value) }}
                            {%- endif -%}
                        {%- endif -%}
                    {%- endfor -%}
                    }
                {%- endif -%}
            {%- endif -%}
            {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
            type:<|"|>{{ value['type'] | upper }}<|"|>}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}
{%- macro format_function_declaration(tool_data) -%}
    declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
    {%- set params = tool_data['function']['parameters'] -%}
    {%- if params -%}
        ,parameters:{
        {%- if params['properties'] -%}
            properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
        {%- endif -%}
        {%- if params['required'] -%}
            required:[
            {%- for item in params['required'] -%}
                <|"|>{{- item -}}<|"|>
                {{- ',' if not loop.last -}}
            {%- endfor -%}
            ],
        {%- endif -%}
        {%- if params['type'] -%}
            type:<|"|>{{- params['type'] | upper -}}<|"|>}
        {%- endif -%}
    {%- endif -%}
    {%- if 'response' in tool_data['function'] -%}
        {%- set response_declaration = tool_data['function']['response'] -%}
        ,response:{
        {%- if response_declaration['description'] -%}
            description:<|"|>{{- response_declaration['description'] -}}<|"|>,
        {%- endif -%}
        {%- if response_declaration['type'] | upper == 'OBJECT' -%}
            type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
        {%- endif -%}
    {%- endif -%}
    }
{%- endmacro -%}
{%- macro format_argument(argument, escape_keys=True) -%}
    {%- if argument is string -%}
        {{- '<|"|>' + argument + '<|"|>' -}}
    {%- elif argument is boolean -%}
        {{- 'true' if argument else 'false' -}}
    {%- elif argument is mapping -%}
        {{- '{' -}}
        {%- set ns = namespace(found_first=false) -%}
        {%- for key, value in argument | dictsort -%}
            {%- if ns.found_first %},{% endif -%}
            {%- set ns.found_first = true -%}
            {%- if escape_keys -%}
                {{- '<|"|>' + key + '<|"|>' -}}
            {%- else -%}
                {{- key -}}
            {%- endif -%}
            :{{- format_argument(value, escape_keys=escape_keys) -}}
        {%- endfor -%}
        {{- '}' -}}
    {%- elif argument is sequence -%}
        {{- '[' -}}
        {%- for item in argument -%}
            {{- format_argument(item, escape_keys=escape_keys) -}}
            {%- if not loop.last %},{% endif -%}
        {%- endfor -%}
        {{- ']' -}}
    {%- else -%}
        {{- argument -}}
    {%- endif -%}
{%- endmacro -%}
{#- Removes '<|channel>...<channel|>' thinking blocks from model output.
    Splits on the end token '<channel|>', then checks each part for the start
    token '<|channel>' and keeps only the text before it. -#}
{%- macro strip_thinking(text) -%}
    {%- set ns = namespace(cleaned='') -%}
    {%- for part in text.split('<channel|>') -%}
        {%- if '<|channel>' in part -%}
            {%- set ns.cleaned = ns.cleaned + part.split('<|channel>')[0] -%}
        {%- else -%}
            {%- set ns.cleaned = ns.cleaned + part -%}
        {%- endif -%}
    {%- endfor -%}
    {{- ns.cleaned | trim -}}
{%- endmacro -%}

{%- set ns = namespace(prev_message_type=None) -%}
{%- set loop_messages = messages -%}
{{ bos_token }}
{#- Handle System/Tool Definitions Block -#}
{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
    {{- '<|turn>system\n' -}}

    {#- Inject Thinking token at the very top of the FIRST system turn -#}
    {%- if enable_thinking is defined and enable_thinking -%}
        {{- '<|think|>' -}}
        {%- set ns.prev_message_type = 'think' -%}
    {%- endif -%}

    {%- if messages[0]['role'] in ['system', 'developer'] -%}
        {{- messages[0]['content'] | trim -}}
        {%- set loop_messages = messages[1:] -%}
    {%- endif -%}

    {%- if tools -%}
        {%- for tool in tools %}
            {{- '<|tool>' -}}
            {{- format_function_declaration(tool) | trim -}}
            {{- '<tool|>' -}}
        {%- endfor %}
        {%- set ns.prev_message_type = 'tool' -%}
    {%- endif -%}

    {{- '<turn|>\n' -}}
{%- endif %}

{#- Loop through messages -#}
{%- for message in loop_messages -%}
    {#- Reset so only special message types (tool_call, image, etc.) influence
        the generation prompt formatting below. Plain text leaves it as None. -#}
    {%- set ns.prev_message_type = None -%}
    {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
        {{- '<|turn>' + role + '\n' }}

            {%- if message['tool_calls'] -%}
                {%- for tool_call in message['tool_calls'] -%}
                    {%- set function = tool_call['function'] -%}
                    {{- '<|tool_call>call:' + function['name'] + '{' -}}
                    {%- if function['arguments'] is mapping -%}
                        {%- set ns_args = namespace(found_first=false) -%}
                        {%- for key, value in function['arguments'] | dictsort -%}
                            {%- if ns_args.found_first %},{% endif -%}
                            {%- set ns_args.found_first = true -%}
                            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
                        {%- endfor -%}
                    {%- elif function['arguments'] is string -%}
                        {{- function['arguments'] -}}
                    {%- endif -%}
                    {{- '}<tool_call|>' -}}
                {%- endfor -%}
                {%- set ns.prev_message_type = 'tool_call' -%}
            {%- endif -%}

            {%- if message['tool_responses'] -%}
                {#- Tool Response handling -#}
                {%- for tool_response in message['tool_responses'] -%}
                    {{- '<|tool_response>' -}}
                    {%- if tool_response['response'] is mapping -%}
                        {{- 'response:' + tool_response['name'] | default('unknown') + '{' -}}
                        {%- for key, value in tool_response['response'] | dictsort -%}
                            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
                            {%- if not loop.last %},{% endif -%}
                        {%- endfor -%}
                        {{- '}' -}}
                    {%- else -%}
                        {{- 'response:' + tool_response['name'] | default('unknown') + '{value:' + format_argument(tool_response['response'], escape_keys=False) + '}' -}}
                    {%- endif -%}
                    {{- '<tool_response|>' -}}
                {%- endfor -%}
                {%- set ns.prev_message_type = 'tool_response' -%}
            {%- endif -%}

            {%- if message['content'] is string -%}
                {%- if role == 'model' -%}
                    {{- strip_thinking(message['content']) -}}
                {%- else -%}
                    {{- message['content'] | trim -}}
                {%- endif -%}
            {%- elif message['content'] is sequence -%}
                {%- for item in message['content'] -%}
                    {%- if item['type'] == 'text' -%}
                        {%- if role == 'model' -%}
                            {{- strip_thinking(item['text']) -}}
                        {%- else -%}
                            {{- item['text'] | trim -}}
                        {%- endif -%}
                    {%- elif item['type'] == 'image' -%}
                        {{- '\n\n<|image|>\n\n' -}}
                        {%- set ns.prev_message_type = 'image' -%}
                    {%- elif item['type'] == 'audio' -%}
                        {{- '<|audio|>' -}}
                        {%- set ns.prev_message_type = 'audio' -%}
                    {%- elif item['type'] == 'video' -%}
                        {{- '\n\n<|video|>\n\n' -}}
                        {%- set ns.prev_message_type = 'video' -%}
                    {%- endif -%}
                {%- endfor -%}
            {%- endif -%}

        {%- if not (message['tool_responses'] and not message['content']) -%}
            {{- '<turn|>\n' -}}
        {%- endif -%}
{%- endfor -%}

{%- if add_generation_prompt -%}
    {%- if ns.prev_message_type != 'tool_response' -%}
        {{- '<|turn>model\n' -}}
    {%- endif -%}
    {%- if not enable_thinking | default(false) -%}
        {{- '<|channel>thought\n<channel|>' -}}
    {%- endif -%}
{%- endif -%}

---
[2026-05-14 19:15:37,615] [WARNING] [axolotl.prompt_strategies.chat_template] EOS token '<eos>' not found in chat_template. Please check if your template/EOS token is correct.
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:00<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:03<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:05<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:06<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:08<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:10<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:12<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:14<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:16<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:18<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:20<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:22<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:24<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:26<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:28<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:30<?, ? examples/s]Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Kwargs passed to `processor.__call__` have to be in `processor_kwargs` dict, not in `**kwargs`
Tokenizing Prompts (num_proc=16):   0%|                                                                                                                               | 0/7473 [00:32<?, ? examples/s]
[2026-05-14 19:16:12,319] [ERROR] [axolotl.telemetry.errors] Error captured in telemetry. Run ID: 4cf03c54-f159-47a4-8564-45720f596c09
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 585, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3998, in _map_single
    for i, batch in iter_outputs(shard_iterable):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3951, in iter_outputs
    yield i, apply_function(example, i, offset=offset)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3872, in apply_function
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 418, in tokenize_prompt
    tokenized_prompt = self._tokenize_single_prompt(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 520, in _tokenize_single_prompt
    turn_start_idx, turn_end_idx = self.find_turn(
                                   ^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/prompt_strategies/chat_template.py", line 725, in find_turn
    if dummy_ids[i] != full_ids[i]:
       ~~~~~~~~~^^^
KeyError: 0
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 145, in <module>
    fire.Fire(do_cli)
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 96, in do_cli
    do_train(parsed_cfg, parsed_cli_args)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 48, in do_train
    dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/telemetry/errors.py", line 127, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/common/datasets.py", line 61, in load_datasets
    train_dataset, eval_dataset, total_num_steps, prompters = prepare_datasets(
                                                              ^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/utils.py", line 50, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 65, in prepare_datasets
    return _prepare_standard_dataset(cfg, tokenizer, processor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 98, in _prepare_standard_dataset
    train_dataset, eval_dataset, prompters = loader.load(_load_datasets)
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/lock.py", line 38, in load
    result = load_fn()
             ^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 77, in _load_datasets
    train_dataset, eval_dataset, prompters = _load_and_prepare_datasets(
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 496, in _load_and_prepare_datasets
    dataset, prompters = _load_tokenized_prepared_datasets(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 299, in _load_tokenized_prepared_datasets
    dataset, prompters = _load_raw_datasets(
                         ^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 331, in _load_raw_datasets
    dataset_wrapper, dataset_prompter = _load_and_process_single_dataset(
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/sft.py", line 411, in _load_and_process_single_dataset
    dataset_wrapper, dataset_prompter = get_dataset_wrapper(
                                        ^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/wrappers.py", line 123, in get_dataset_wrapper
    return _handle_loaded_strategy(dataset_strategy, dataset, dataset_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data/wrappers.py", line 223, in _handle_loaded_strategy
    dataset_wrapper = wrap_dataset_for_tokenized_prompt(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/datasets.py", line 87, in wrap_dataset_for_tokenized_prompt
    return TokenizedPromptDataset(prompt_tokenizer, dataset, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/datasets.py", line 40, in __init__
    self.process(dataset).data,
    ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/datasets.py", line 62, in process
    return dataset.map(
           ^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 575, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 3624, in map
    for rank, done, content in iflatmap_unordered(
                               ^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 624, in iflatmap_unordered
    [async_result.get(timeout=0.05) for async_result in async_results]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/multiprocess/pool.py", line 774, in get
    raise self._value
KeyError: 0
Traceback (most recent call last):
  File "/workspace/axolotl-venv/bin/accelerate", line 12, in <module>
    sys.exit(main())
             ^^^^^^
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
    args.func(args)
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1405, in launch_command
    simple_launcher(args)
  File "/workspace/axolotl-venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 993, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/axolotl-venv/bin/python', '-m', 'axolotl.cli.train', './configs/jacob/mre.yaml', '--debug=False', '--debug-text-only=False', '--debug-num-examples=0', '--shard=False']' returned non-zero exit status 1.

Steps to reproduce

  1. Paste the attached config into a file and run
axolotl train ./PATH_TO_CONFIG.yaml
  1. The error appears

Config yaml

base_model: google/gemma-4-E2B-it
load_in_4bit: true
sequence_len: 2048
freeze_mm_modules: true

processor_type: AutoProcessor

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: false

skip_prepare_dataset: false
remove_unused_columns: true
sample_packing: true
pad_to_sequence_len: false

chat_template: gemma4

datasets:
  - path: mags0ft/gsm8k-chatml
    type: chat_template
    split: train

output_dir: ./outputs/mre

# r = 8, alpha = 16
adapter: qlora
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0

lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

gradient_accumulation_steps: 8
micro_batch_size: 1

num_epochs: 1
optimizer: adamw_torch_8bit

lr_scheduler: cosine
learning_rate: 0.0002
warmup_ratio: 0.1
weight_decay: 0.0

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

logging_steps: 1
attn_implementation: sdpa

Possible solution

I have no clue, even after digging through the code from the traceback, unfortunately :(

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.12.13

axolotl branch-commit

main/d7cb1c9d03dace6e3db34ed5b8ee908eae671a7a

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions