amd
diff --git a/‎playbooks/core/pytorch-rocm-llms/README.md‎
Lines changed: 132 additions & 2 deletions b/‎playbooks/core/pytorch-rocm-llms/README.md‎
Lines changed: 132 additions & 2 deletions
diff --git a/‎playbooks/core/pytorch-rocm-llms/assets/cover_img.jpeg‎
39.7 KB b/‎playbooks/core/pytorch-rocm-llms/assets/cover_img.jpeg‎
39.7 KB
diff --git a/‎playbooks/core/pytorch-rocm-llms/assets/example_document.txt‎
Lines changed: 11 additions & 0 deletions b/‎playbooks/core/pytorch-rocm-llms/assets/example_document.txt‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎playbooks/core/pytorch-rocm-llms/assets/run_llm.py‎
Lines changed: 85 additions & 0 deletions b/‎playbooks/core/pytorch-rocm-llms/assets/run_llm.py‎
Lines changed: 85 additions & 0 deletions
@@ -1,3 +1,133 @@
-# Running LLMs on PyTorch with ROCm
+## Overview
 
-<!-- Playbook content goes here -->
+Want to run powerful AI language models on your own STX Halo™ ? This guide shows you how.
+This tutorial uses PyTorch powered by AMD's ROCm to run models that can summarize documents, answer questions, generate text, and more, all running locally.
+
+## What You'll Learn
+
+- Run LLMs like gpt-oss-20b and Mistral-7B-Instruct locally using PyTorch and ROCm
+- Create a document summarization tool using LLMs
+
+## Setting Up Your Environment
+
+### Create a Virtual Environment
+
+<!-- @os:windows -->
+On Windows, open Command Prompt and run:
+```cmd
+python -m venv llm-env
+llm-env\Scripts\activate.bat
+```
+<!-- @os:end -->
+
+<!-- @os:linux -->
+```bash
+sudo apt update
+sudo apt install -y python3-venv
+python3 -m venv llm-env
+source llm-env/bin/activate
+```
+<!-- @os:end -->
+
+### Installing Basic Dependencies
+<!-- @require:pytorch -->
+
+### Additional Dependencies
+
+```bash
+pip install transformers accelerate sentencepiece protobuf
+```
+
+## Quick Start with Example Scripts
+
+This playbook includes ready-to-use scripts in the `assets/` folder (click to preview):
+
+| Script | Description | Usage |
+|--------|-------------|-------|
+| [run_llm.py](assets/run_llm.py) | Basic LLM text generation | `python run_llm.py` |
+| [summarizer.py](assets/summarizer.py) | Document summarizer with Harmony support | `python summarizer.py --file document.txt` |
+
+Both scripts support:
+- Model selection: `--model gptoss` (default) or `--model mistral`
+- Chat template formatting for proper model prompting especially useful for document summarization
+
+## Loading and Running Your First LLM
+
+The included [run_llm.py](assets/run_llm.py) script shows how to load and generate text with LLMs using PyTorch and AMD ROCm. On the first run, model weights are automatically downloaded.
+
+Take a look at how prompts are tokenized and sent to the model. Understanding this process lets you adapt LLMs for any text generation or summarization task. Here’s a minimal example from the script:
+
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_name = "openai/gpt-oss-20b"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+```
+
+To try it out:
+
+```bash
+python run_llm.py
+```
+
+## Building a Document Summarizer
+
+Build on your LLM setup by turning it into a practical document summarizer. In this section, you will use the [summarizer.py](assets/summarizer.py) script to feed in a .txt file and automatically generate a concise summary, all running locally on your GPU.
+
+The script is designed to work out of the box: point it at a text file, pick a model, and it returns a clear 2–3 sentence overview. As you explore the code, you can customize prompts, tweak parameters like length and temperature, and see how different models behave.
+
+### Usage Examples
+
+```bash
+
+# Summarize document
+python summarizer.py
+
+# Summarize a text file
+python summarizer.py --file example_document.txt
+
+# Adjust creativity with temperature
+python summarizer.py --file document.txt --temperature 0.5
+
+# Try different model instead
+python summarizer.py --file document.txt --model mistral
+
+# Longer summaries with more tokens
+python summarizer.py --file document.txt --max-length 200
+```
+
+## Generation Parameters
+
+| Parameter | What It Controls | Typical Values |
+|-----------|------------------|----------------|
+| `max_new_tokens` | Length of output | 50–500 for summaries |
+| `temperature` | Randomness/creativity | 0.2–0.3 for summaries, 0.7–0.9 for creative tasks |
+| `top_p` | Nucleus sampling | 0.9 (standard) |
+
+**Temperature Guide**: 
+- 0.1–0.3: Focused, deterministic (good for summaries)
+- 0.5–0.7: Balanced (general use)
+- 0.8–1.0: Creative, varied (brainstorming)
+
+## Real-World Applications
+
+- **Research Paper Analysis**: Extract key findings from complex publications for quick review
+- **News Aggregation**: Summarize news articles into brief daily digests or highlights
+- **Meeting Notes**: Condense transcripts into actionable items and concise summaries
+- **Legal Document Review**: Extract relevant clauses or obligations from long legal texts quickly
+- **Code Documentation**: Generate concise repository overviews and function explanations
+
+## Next Steps
+
+- **Fine-tuning**: Adapt models to your specific field or jargon for better accuracy (see PyTorch Fine-tuning Playbook)
+- **RAG Systems**: Combine LLMs with document retrieval for context-aware answers and search
+- **Model Exploration**: Experiment with new models like Llama 3, Phi-3, or Qwen for better results
+- **Production Deployment**: Use tools like vLLM or TGI for scalable LLM serving in organizations
+
+Your STX Halo gives you the power to run sophisticated language models locally. Experiment with different models, prompts, and parameters to discover what works best for your applications.
@@ -0,0 +1,11 @@
+Large language models (LLMs) are neural networks with billions of parameters 
+trained on massive text datasets. They learn to predict the next word in a sequence, 
+developing an understanding of language patterns, facts, and reasoning. Modern LLMs like 
+GPT-4, Claude, and Llama can perform diverse tasks including translation, question answering, 
+code generation, and creative writing. The key breakthrough was the transformer architecture, 
+which uses attention mechanisms to process sequences in parallel. Training these models requires 
+enormous computational resources, but once trained, they can run on consumer hardware for 
+inference tasks. Recent advances include instruction tuning, where models are fine-tuned to 
+follow user instructions more accurately, and reinforcement learning from human feedback (RLHF), 
+which aligns model outputs with human preferences. The field continues to evolve rapidly with 
+new architectures, training techniques, and applications emerging regularly.
@@ -0,0 +1,85 @@
+"""
+Basic LLM Loading and Text Generation
+======================================
+
+This script demonstrates how to:
+- Load a language model with ROCm acceleration
+- Generate text from a prompt
+- Use different generation parameters
+
+Usage:
+    python run_llm.py
+"""
+
+import os
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+import logging
+import warnings
+
+logging.getLogger("transformers").setLevel(logging.ERROR)
+warnings.filterwarnings("ignore", category=UserWarning)
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+
+def main():
+    # Verify ROCm is available
+    print("="*10 + " ROCm Configuration" + "="*10)
+    print(f"ROCm available: {torch.cuda.is_available()}")
+    if torch.cuda.is_available():
+        print(f"GPU: {torch.cuda.get_device_name(0)}")
+        print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
+    print()
+
+    # Load model and tokenizer
+    model_name = "openai/gpt-oss-20b"
+    # To use Mistral-7B instead of GPT-OSS-20B, uncomment the following line
+    # model_name = "mistralai/Mistral-7B-Instruct-v0.3"
+
+    print(f"Loading {model_name}...")
+    print("First run will download ~14GB, this may take a few minutes")
+    print("For AMD Halo Developer Platforms, the model will be pre-installed.")
+    
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name,
+        torch_dtype=torch.bfloat16,
+        device_map="auto"
+    )
+
+    print("✓ Model loaded successfully!\n")
+
+    # Create a simple prompt
+    prompt = "Explain what a large language model is in simple terms:"
+    
+    print(f"Prompt: {prompt}\n")
+
+    # Tokenize input
+    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+
+    # Generate response
+    print("Generating... (this may take 10-30 seconds)")
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=200,
+        temperature=0.7,
+        do_sample=True,
+        top_p=0.9
+    )
+
+    # Decode and print
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    
+    print()
+    print("Model Output:\n")
+    response_text = response[len(prompt):].strip() if response.startswith(prompt) else response.strip()
+    print(response_text)
+    print("\nDone. Try changing the prompt or generation settings for different explanations.")
+    
+    # Cleanup GPU memory and exit cleanly
+    del model
+    del tokenizer
+    torch.cuda.empty_cache()
+
+if __name__ == "__main__":
+    main()