docs/faq.qmd

---
title: FAQ
description: Frequently asked questions
---

### General

**Q: The trainer stopped and hasn't progressed in several minutes.**

> A: Usually an issue with the GPUs communicating with each other. See the [NCCL doc](nccl.qmd)

**Q: exitcode: -9**

> A: This usually happens when you run out of system RAM.

**Q: exitcode: -7 while using deepspeed**

> A: Try upgrading deepspeed w: `pip install -U deepspeed`

**Q: AttributeError: 'DummyOptim' object has no attribute 'step'**

**Q: ModuleNotFoundError: No module named 'mpi4py' using single GPU with deepspeed**

> A: You may be using deepspeed with single gpu. Please remove the `deepspeed:` section in the yaml file or `--deepspeed` CLI flag.

**Q: The codes is stuck on saving preprocessed datasets.**

> A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable `CUDA_VISIBLE_DEVICES=0`. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.

**Q: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.**

> A: This is likely due to vocab size mismatch. By default, Axolotl expands the model's embeddings if the tokenizer has more tokens than the model. Please use the `axolotl merge-lora` command to merge the adapters instead of using your own scripts.

> On the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model's embeddings unless `shrink_embeddings: true` is set in the config.

**Q: How to call Axolotl via custom python scripts?**

> A: Since Axolotl is just Python, please see `src/axolotl/cli/main.py` on how each command is called.

**Q: How to know the value to use for `fsdp_transformer_layer_cls_to_wrap`?**

> A: This is the class name of the transformer layer to wrap with FSDP. For example, for `LlamaForCausalLM`, the value is `LlamaDecoderLayer`. To find this for a specific model, check the model's `PreTrainedModel` definition and look for `_no_split_modules` variable in the `modeling_<model_name>.py` file within `transformers` library.

**Q: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token**

> A: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via:

> ```yaml
> special_tokens:
>   # str. If you're not sure, set to same as `eos_token`.
>   pad_token: "..."
> ```

**Q: `IterableDataset error` or `KeyError: 'input_ids'` when using `preprocess` CLI**

> A: This is because you may be using `preprocess` CLI with `pretraining_dataset:` or `skip_prepare_dataset: true` respectively. Please use `axolotl train` CLI directly instead as these datasets are prepared on demand.

**Q: vLLM is not working with Axolotl**

> A: We currently recommend torch 2.6.0 for use with `vllm`. Please ensure you use the right version. For Docker, please use the `main-py3.11-cu124-2.6.0` tag.

**Q: FA2 2.8.0 `undefined symbol` runtime error on CUDA 12.4**

> A: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717.

### Chat templates

**Q: `jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____`**

> A: This means that the property mapping for the stated attribute does not exist when building `chat_template` prompt. For example, if `no attribute 'content'`, please check you have added the correct mapping for `content` under `message_property_mappings`.

**Q: `Empty template generated for turn ___`**

> A: The `content` is empty for that turn.

**Q: `Could not find content start/end boundary for turn __`**

> A: The specific turn's start/end could not be detected. Please ensure you have set the `eos_token` following your `chat_template`. Otherwise, this could be a `chat_template` which doesn't use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not `[[dummy_message]]`. Please let us know about this.

**Q: `Content end boundary is before start boundary for turn ___`**

> A: This is an edge case which should not occur. Please create an Issue if this happens.

**Q: `Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.`**

> A: This is likely an empty turn.

**Q: The EOS token is incorrectly being masked or not being masked / `EOS token __ not found in chat template`.**

> A: There can be two reasons:

> 1. This is because of the mismatch between `tokenizer.eos_token` and EOS token in template. Please make sure to set `eos_token: ` under `special_tokens: ` to the same EOS token as in template.

> 2. The EOS token is not in the template. Please check if your template is correct. As an example, `phi_35` template does not use its dedicated EOS token `<|endoftext|>` at the end.

**Q: "`chat_template` choice is `tokenizer_default` but tokenizer's `chat_template` is null. Please add a `chat_template` in tokenizer config"**

> A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See [chat_template](dataset-formats/conversation.qmd#chat-template) for more details.

**Q: The EOT token(s) are incorrectly being masked or not being masked / `EOT token __ not found in chat template`.**

> A: There can be two reasons:

> 1. The EOT token is different from the EOS token and was not specified under `eot_tokens: `. Please set `eot_tokens: ` to the same EOT token(s) as in template.

> 2. There is more than one EOT token per turn in the template. Please raise an issue with examples as we recognize this as an edge case.

**Q: `EOT token encoding failed. Please check if the token is valid and can be encoded.`**

> A: There could be some issue with the tokenizer or unicode encoding. Please raise an issue with examples with the EOT token & tokenizer causing the issue.

**Q: `EOT token __ is encoded as multiple tokens.`**

> A: This is because the EOT token is encoded as multiple tokens which can cause unexpected behavior. Please add it under `tokens: ` or (recommended) override unused added_tokens via `added_tokens_overrides: `.

**Q: `Conflict between train_on_eos and train_on_eot. eos_token is in eot_tokens and train_on_eos != train_on_eot`**

> A: This is because the EOS token is in the `eot_tokens: ` while mismatch between `train_on_eos: ` and `train_on_eot: `. This will cause one to override the other. Please ensure that `train_on_eos: ` and `train_on_eot: ` are the same or remove the EOS token from `eot_tokens: `.

**Q: If `eot_tokens: ` is not provided, what happens?**

> A: If `eot_tokens: ` is not provided, the default behavior is the same as before. EOS tokens used to delimit turns are masked/unmasked depending on whether the turn is trainable.

> Internally, `eot_tokens: tokenizer.eos_token` and `train_on_eot: train_on_eos` (which defaults to `turn`). This transition helps clarify the naming and behavior of EOT/EOS tokens.

**Q: `Data processing error: CAS service error`**

> A: Try disabling XET with `export HF_HUB_DISABLE_XET=1`

**Q: `torch._inductor.exc.LoweringException: NoValidChoicesError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. `**

> A: Depending on the version of torch, you may need to include this in your YAML:

> ```yaml
> flex_attn_compile_kwargs:
>   dynamic: false
>   mode: max-autotune-no-cudagraphs
> ```