diff --git a/skills/mlops/training/axolotl/references/api.md b/skills/mlops/training/axolotl/references/api.md index f00b6eb6a..2f94b5394 100644 --- a/skills/mlops/training/axolotl/references/api.md +++ b/skills/mlops/training/axolotl/references/api.md @@ -3240,7 +3240,7 @@ Prompt Strategy for finetuning Llama2 chat models see also https://github.com/fa This implementation is based on the Vicuna PR and the fastchat repo, see also: https://github.com/lm-sys/FastChat/blob/cdd7730686cb1bf9ae2b768ee171bdf7d1ff04f3/fastchat/conversation.py#L847 -Use dataset type: “llama2_chat” in conig.yml to use this prompt style. +Use dataset type: “llama2_chat” in config.yml to use this prompt style. E.g. in the config.yml: @@ -4991,7 +4991,7 @@ prompt_strategies.orcamini Prompt Strategy for finetuning Orca Mini (v2) models see also https://huggingface.co/psmathur/orca_mini_v2_7b for more information -Use dataset type: orcamini in conig.yml to use this prompt style. +Use dataset type: orcamini in config.yml to use this prompt style. Compared to the alpaca_w_system.open_orca dataset type, this one specifies the system prompt with “### System:”. diff --git a/skills/mlops/training/pytorch-fsdp/references/other.md b/skills/mlops/training/pytorch-fsdp/references/other.md index d5b6cae6f..2b544dc98 100644 --- a/skills/mlops/training/pytorch-fsdp/references/other.md +++ b/skills/mlops/training/pytorch-fsdp/references/other.md @@ -2290,7 +2290,7 @@ This call gives the AsyncStager the opportunity to ‘stage’ the state_dict. T for serializing the state_dict and writing it to storage. -the serialization thread starts and before returning from dcp.async_save. If this is set to False, the assumption is the user has defined a custom synchronization point for the the purpose of further optimizing save latency in the training loop (for example, by overlapping staging with the forward/backward pass), and it is the respondsibility of the user to call AsyncStager.synchronize_staging at the appropriate time. +the serialization thread starts and before returning from dcp.async_save. If this is set to False, the assumption is the user has defined a custom synchronization point for the purpose of further optimizing save latency in the training loop (for example, by overlapping staging with the forward/backward pass), and it is the respondsibility of the user to call AsyncStager.synchronize_staging at the appropriate time. Clean up all resources used by the stager. @@ -2430,7 +2430,7 @@ Read the checkpoint metadata. The metadata object associated with the checkpoint being loaded. -Calls to indicates a brand new checkpoint read is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint read. The meaning of the checkpiont_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage. +Calls to indicates a brand new checkpoint read is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint read. The meaning of the checkpoint_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage. checkpoint_id (Union[str, os.PathLike, None]) – The ID of this checkpoint instance. The meaning of the checkpoint_id depends on the storage. It can be a path to a folder or to a file. It can also be a key if the storage is more like a key-value store. (Default: None) @@ -2488,7 +2488,7 @@ plan (SavePlan) – The local plan from the SavePlanner in use. A transformed SavePlan after storage local planning -Calls to indicates a brand new checkpoint write is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint write. The meaning of the checkpiont_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage. +Calls to indicates a brand new checkpoint write is going to happen. A checkpoint_id may be present if users set the checkpoint_id for this checkpoint write. The meaning of the checkpoint_id is storage-dependent. It can be a path to a folder/file or a key for a key-value storage. checkpoint_id (Union[str, os.PathLike, None]) – The ID of this checkpoint instance. The meaning of the checkpoint_id depends on the storage. It can be a path to a folder or to a file. It can also be a key if the storage is a key-value store. (Default: None) @@ -2498,7 +2498,19 @@ is_coordinator (bool) – Whether this instance is responsible for coordinating Return the storage-specific metadata. This is used to store additional information in a checkpoint that can be useful for providing request-level observability. StorageMeta is passed to the SavePlanner during save calls. Returns None by default. -TODO: provide an example +Example: + +```python +from torch.distributed.checkpoint.storage import StorageMeta + +class CustomStorageBackend: + def get_storage_metadata(self): + # Return storage-specific metadata that will be stored with the checkpoint + return StorageMeta() +``` + +This example shows how a storage backend can return `StorageMeta` +to attach additional metadata to a checkpoint. Optional[StorageMeta] @@ -3441,7 +3453,7 @@ The target module does not have to be an FSDP module. A StateDictSettings containing the state_dict_type and state_dict / optim_state_dict configs that are currently set. -AssertionError` if the StateDictSettings for differen – +AssertionError` if the StateDictSettings for different – FSDP submodules differ. – @@ -3766,7 +3778,7 @@ The sharing is done as described by ZeRO. The local optimizer instance in each rank is only responsible for updating approximately 1 / world_size parameters and hence only needs to keep 1 / world_size optimizer states. After parameters are updated locally, each rank will broadcast its parameters to all other peers to keep all model replicas in the same state. ZeroRedundancyOptimizer can be used in conjunction with torch.nn.parallel.DistributedDataParallel to reduce per-rank peak memory consumption. -ZeroRedundancyOptimizer uses a sorted-greedy algorithm to pack a number of parameters at each rank. Each parameter belongs to a single rank and is not divided among ranks. The partition is arbitrary and might not match the the parameter registration or usage order. +ZeroRedundancyOptimizer uses a sorted-greedy algorithm to pack a number of parameters at each rank. Each parameter belongs to a single rank and is not divided among ranks. The partition is arbitrary and might not match the parameter registration or usage order. params (Iterable) – an Iterable of torch.Tensor s or dict s giving all parameters, which will be sharded across ranks. diff --git a/skills/mlops/training/unsloth/references/llms-full.md b/skills/mlops/training/unsloth/references/llms-full.md index b0b6b24d9..df3d2eebb 100644 --- a/skills/mlops/training/unsloth/references/llms-full.md +++ b/skills/mlops/training/unsloth/references/llms-full.md @@ -6348,7 +6348,7 @@ Our chat templates for the GGUF, our BnB and BF16 uploads and all versions are f ### :1234: Precision issues -We found multiple precision issues in Tesla T4 and float16 machines primarily since the model was trained using BF16, and so outliers and overflows existed. MXFP4 is not actually supported on Ampere and older GPUs, so Triton provides `tl.dot_scaled` for MXFP4 matrix multiplication. It upcasts the matrices to BF16 internaly on the fly. +We found multiple precision issues in Tesla T4 and float16 machines primarily since the model was trained using BF16, and so outliers and overflows existed. MXFP4 is not actually supported on Ampere and older GPUs, so Triton provides `tl.dot_scaled` for MXFP4 matrix multiplication. It upcasts the matrices to BF16 internally on the fly. We made a [MXFP4 inference notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/GPT_OSS_MXFP4_\(20B\)-Inference.ipynb) as well in Tesla T4 Colab! @@ -14877,7 +14877,7 @@ curl -X POST http://localhost:8000/v1/unload_lora_adapter \ # Text-to-Speech (TTS) Fine-tuning -Learn how to to fine-tune TTS & STT voice models with Unsloth. +Learn how to fine-tune TTS & STT voice models with Unsloth. Fine-tuning TTS models allows them to adapt to your specific dataset, use case, or desired style and tone. The goal is to customize these models to clone voices, adapt speaking styles and tones, support new languages, handle specific tasks and more. We also support **Speech-to-Text (STT)** models like OpenAI's Whisper. @@ -15306,7 +15306,7 @@ snapshot_download( ) ``` -And and let's do inference! +And let's do inference! {% code overflow="wrap" %} @@ -16036,7 +16036,7 @@ Then train the model as usual via `trainer.train() .` Tips to solve issues, and frequently asked questions. -If you're still encountering any issues with versions or depencies, please use our [Docker image](https://docs.unsloth.ai/get-started/install-and-update/docker) which will have everything pre-installed. +If you're still encountering any issues with versions or dependencies, please use our [Docker image](https://docs.unsloth.ai/get-started/install-and-update/docker) which will have everything pre-installed. {% hint style="success" %} **Try always to update Unsloth if you find any issues.** diff --git a/skills/mlops/training/unsloth/references/llms-txt.md b/skills/mlops/training/unsloth/references/llms-txt.md index c5895c7cd..22f651e41 100644 --- a/skills/mlops/training/unsloth/references/llms-txt.md +++ b/skills/mlops/training/unsloth/references/llms-txt.md @@ -40,7 +40,7 @@ Read more on running Llama 4 here: