Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions skills/mlops/training/axolotl/references/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -3240,7 +3240,7 @@ Prompt Strategy for finetuning Llama2 chat models see also https://github.com/fa

This implementation is based on the Vicuna PR and the fastchat repo, see also: https://github.com/lm-sys/FastChat/blob/cdd7730686cb1bf9ae2b768ee171bdf7d1ff04f3/fastchat/conversation.py#L847

Use dataset type: “llama2_chat” in conig.yml to use this prompt style.
Use dataset type: “llama2_chat” in config.yml to use this prompt style.

E.g. in the config.yml:

Expand Down Expand Up @@ -4991,7 +4991,7 @@ prompt_strategies.orcamini

Prompt Strategy for finetuning Orca Mini (v2) models see also https://huggingface.co/psmathur/orca_mini_v2_7b for more information

Use dataset type: orcamini in conig.yml to use this prompt style.
Use dataset type: orcamini in config.yml to use this prompt style.

Compared to the alpaca_w_system.open_orca dataset type, this one specifies the system prompt with “### System:”.

Expand Down
24 changes: 18 additions & 6 deletions skills/mlops/training/pytorch-fsdp/references/other.md
Original file line number Diff line number Diff line change
Expand Up @@ -2290,7 +2290,7 @@ This call gives the AsyncStager the opportunity to ‘stage’ the state_dict. T

for serializing the state_dict and writing it to storage.

the serialization thread starts and before returning from dcp.async_save. If this is set to False, the assumption is the user has defined a custom synchronization point for the the purpose of further optimizing save latency in the training loop (for example, by overlapping staging with the forward/backward pass), and it is the respondsibility of the user to call AsyncStager.synchronize_staging at the appropriate time.
the serialization thread starts and before returning from dcp.async_save. If this is set to False, the assumption is the user has defined a custom synchronization point for the purpose of further optimizing save latency in the training loop (for example, by overlapping staging with the forward/backward pass), and it is the respondsibility of the user to call AsyncStager.synchronize_staging at the appropriate time.

Clean up all resources used by the stager.

Expand Down Expand Up @@ -2430,7 +2430,7 @@ Read the checkpoint metadata.

The metadata object associated with the checkpoint being loaded.

Calls to indicates a brand new checkpoint read is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint read. The meaning of the checkpiont_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage.
Calls to indicates a brand new checkpoint read is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint read. The meaning of the checkpoint_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage.

checkpoint_id (Union[str, os.PathLike, None]) – The ID of this checkpoint instance. The meaning of the checkpoint_id depends on the storage. It can be a path to a folder or to a file. It can also be a key if the storage is more like a key-value store. (Default: None)

Expand Down Expand Up @@ -2488,7 +2488,7 @@ plan (SavePlan) – The local plan from the SavePlanner in use.

A transformed SavePlan after storage local planning

Calls to indicates a brand new checkpoint write is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint write. The meaning of the checkpiont_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage.
Calls to indicates a brand new checkpoint write is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint write. The meaning of the checkpoint_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage.

checkpoint_id (Union[str, os.PathLike, None]) – The ID of this checkpoint instance. The meaning of the checkpoint_id depends on the storage. It can be a path to a folder or to a file. It can also be a key if the storage is a key-value store. (Default: None)

Expand All @@ -2498,7 +2498,19 @@ is_coordinator (bool) – Whether this instance is responsible for coordinating

Return the storage-specific metadata. This is used to store additional information in a checkpoint that can be useful for providing request-level observability. StorageMeta is passed to the SavePlanner during save calls. Returns None by default.

TODO: provide an example
Example:

```python
from torch.distributed.checkpoint.storage import StorageMeta

class CustomStorageBackend:
def get_storage_metadata(self):
# Return storage-specific metadata that will be stored with the checkpoint
return StorageMeta()
```

This example shows how a storage backend can return `StorageMeta`
to attach additional metadata to a checkpoint.

Optional[StorageMeta]

Expand Down Expand Up @@ -3441,7 +3453,7 @@ The target module does not have to be an FSDP module.

A StateDictSettings containing the state_dict_type and state_dict / optim_state_dict configs that are currently set.

AssertionError` if the StateDictSettings for differen
AssertionError` if the StateDictSettings for different

FSDP submodules differ. –

Expand Down Expand Up @@ -3766,7 +3778,7 @@ The sharing is done as described by ZeRO.

The local optimizer instance in each rank is only responsible for updating approximately 1 / world_size parameters and hence only needs to keep 1 / world_size optimizer states. After parameters are updated locally, each rank will broadcast its parameters to all other peers to keep all model replicas in the same state. ZeroRedundancyOptimizer can be used in conjunction with torch.nn.parallel.DistributedDataParallel to reduce per-rank peak memory consumption.

ZeroRedundancyOptimizer uses a sorted-greedy algorithm to pack a number of parameters at each rank. Each parameter belongs to a single rank and is not divided among ranks. The partition is arbitrary and might not match the the parameter registration or usage order.
ZeroRedundancyOptimizer uses a sorted-greedy algorithm to pack a number of parameters at each rank. Each parameter belongs to a single rank and is not divided among ranks. The partition is arbitrary and might not match the parameter registration or usage order.

params (Iterable) – an Iterable of torch.Tensor s or dict s giving all parameters, which will be sharded across ranks.

Expand Down
8 changes: 4 additions & 4 deletions skills/mlops/training/unsloth/references/llms-full.md
Original file line number Diff line number Diff line change
Expand Up @@ -6348,7 +6348,7 @@ Our chat templates for the GGUF, our BnB and BF16 uploads and all versions are f

### :1234: Precision issues

We found multiple precision issues in Tesla T4 and float16 machines primarily since the model was trained using BF16, and so outliers and overflows existed. MXFP4 is not actually supported on Ampere and older GPUs, so Triton provides `tl.dot_scaled` for MXFP4 matrix multiplication. It upcasts the matrices to BF16 internaly on the fly.
We found multiple precision issues in Tesla T4 and float16 machines primarily since the model was trained using BF16, and so outliers and overflows existed. MXFP4 is not actually supported on Ampere and older GPUs, so Triton provides `tl.dot_scaled` for MXFP4 matrix multiplication. It upcasts the matrices to BF16 internally on the fly.

We made a [MXFP4 inference notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/GPT_OSS_MXFP4_\(20B\)-Inference.ipynb) as well in Tesla T4 Colab!

Expand Down Expand Up @@ -14877,7 +14877,7 @@ curl -X POST http://localhost:8000/v1/unload_lora_adapter \

# Text-to-Speech (TTS) Fine-tuning

Learn how to to fine-tune TTS & STT voice models with Unsloth.
Learn how to fine-tune TTS & STT voice models with Unsloth.

Fine-tuning TTS models allows them to adapt to your specific dataset, use case, or desired style and tone. The goal is to customize these models to clone voices, adapt speaking styles and tones, support new languages, handle specific tasks and more. We also support **Speech-to-Text (STT)** models like OpenAI's Whisper.

Expand Down Expand Up @@ -15306,7 +15306,7 @@ snapshot_download(
)
```

And and let's do inference!
And let's do inference!

{% code overflow="wrap" %}

Expand Down Expand Up @@ -16036,7 +16036,7 @@ Then train the model as usual via `trainer.train() .`

Tips to solve issues, and frequently asked questions.

If you're still encountering any issues with versions or depencies, please use our [Docker image](https://docs.unsloth.ai/get-started/install-and-update/docker) which will have everything pre-installed.
If you're still encountering any issues with versions or dependencies, please use our [Docker image](https://docs.unsloth.ai/get-started/install-and-update/docker) which will have everything pre-installed.

{% hint style="success" %}
**Try always to update Unsloth if you find any issues.**
Expand Down
8 changes: 4 additions & 4 deletions skills/mlops/training/unsloth/references/llms-txt.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Read more on running Llama 4 here: <https://docs.unsloth.ai/basics/tutorial-how-

Example 1 (unknown):
```unknown
And and let's do inference!
And let's do inference!

{% code overflow="wrap" %}
```
Expand Down Expand Up @@ -4272,7 +4272,7 @@ Read our full DeepSeek-R1 blogpost here: [unsloth.ai/blog/deepseekr1-dynamic](ht

Tips to solve issues, and frequently asked questions.

If you're still encountering any issues with versions or depencies, please use our [Docker image](https://docs.unsloth.ai/get-started/install-and-update/docker) which will have everything pre-installed.
If you're still encountering any issues with versions or dependencies, please use our [Docker image](https://docs.unsloth.ai/get-started/install-and-update/docker) which will have everything pre-installed.

{% hint style="success" %}
**Try always to update Unsloth if you find any issues.**
Expand Down Expand Up @@ -6638,7 +6638,7 @@ Our chat templates for the GGUF, our BnB and BF16 uploads and all versions are f

### :1234: Precision issues

We found multiple precision issues in Tesla T4 and float16 machines primarily since the model was trained using BF16, and so outliers and overflows existed. MXFP4 is not actually supported on Ampere and older GPUs, so Triton provides `tl.dot_scaled` for MXFP4 matrix multiplication. It upcasts the matrices to BF16 internaly on the fly.
We found multiple precision issues in Tesla T4 and float16 machines primarily since the model was trained using BF16, and so outliers and overflows existed. MXFP4 is not actually supported on Ampere and older GPUs, so Triton provides `tl.dot_scaled` for MXFP4 matrix multiplication. It upcasts the matrices to BF16 internally on the fly.

We made a [MXFP4 inference notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/GPT_OSS_MXFP4_\(20B\)-Inference.ipynb) as well in Tesla T4 Colab!

Expand Down Expand Up @@ -10259,7 +10259,7 @@ training_args = GRPOConfig(
- Choosing and Loading a TTS Model
- Preparing Your Dataset

Learn how to to fine-tune TTS & STT voice models with Unsloth.
Learn how to fine-tune TTS & STT voice models with Unsloth.

Fine-tuning TTS models allows them to adapt to your specific dataset, use case, or desired style and tone. The goal is to customize these models to clone voices, adapt speaking styles and tones, support new languages, handle specific tasks and more. We also support **Speech-to-Text (STT)** models like OpenAI's Whisper.

Expand Down
2 changes: 1 addition & 1 deletion skills/mlops/training/unsloth/references/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
- [Troubleshooting Inference](/basics/running-and-saving-models/troubleshooting-inference.md): If you're experiencing issues when running or saving your model.
- [vLLM Engine Arguments](/basics/running-and-saving-models/vllm-engine-arguments.md)
- [LoRA Hot Swapping Guide](/basics/running-and-saving-models/lora-hot-swapping-guide.md)
- [Text-to-Speech (TTS) Fine-tuning](/basics/text-to-speech-tts-fine-tuning.md): Learn how to to fine-tune TTS & STT voice models with Unsloth.
- [Text-to-Speech (TTS) Fine-tuning](/basics/text-to-speech-tts-fine-tuning.md): Learn how to fine-tune TTS & STT voice models with Unsloth.
- [Unsloth Dynamic 2.0 GGUFs](/basics/unsloth-dynamic-2.0-ggufs.md): A big new upgrade to our Dynamic Quants!
- [Vision Fine-tuning](/basics/vision-fine-tuning.md): Learn how to fine-tune vision/multimodal LLMs with Unsloth
- [Fine-tuning LLMs with NVIDIA DGX Spark and Unsloth](/basics/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth.md): Tutorial on how to fine-tune and do reinforcement learning (RL) with OpenAI gpt-oss on NVIDIA DGX Spark.
Expand Down
Loading