|
1 | | -# Running LLMs on PyTorch with ROCm |
| 1 | +## Overview |
2 | 2 |
|
3 | | -<!-- Playbook content goes here --> |
| 3 | +Want to run powerful AI language models on your own STX Halo™ ? This guide shows you how. |
| 4 | +This tutorial uses PyTorch powered by AMD's ROCm to run models that can summarize documents, answer questions, generate text, and more, all running locally. |
| 5 | + |
| 6 | +## What You'll Learn |
| 7 | + |
| 8 | +- Run LLMs like gpt-oss-20b and Mistral-7B-Instruct locally using PyTorch and ROCm |
| 9 | +- Create a document summarization tool using LLMs |
| 10 | + |
| 11 | +## Setting Up Your Environment |
| 12 | + |
| 13 | +### Create a Virtual Environment |
| 14 | + |
| 15 | +<!-- @os:windows --> |
| 16 | +On Windows, open Command Prompt and run: |
| 17 | +```cmd |
| 18 | +python -m venv llm-env |
| 19 | +llm-env\Scripts\activate.bat |
| 20 | +``` |
| 21 | +<!-- @os:end --> |
| 22 | + |
| 23 | +<!-- @os:linux --> |
| 24 | +```bash |
| 25 | +sudo apt update |
| 26 | +sudo apt install -y python3-venv |
| 27 | +python3 -m venv llm-env |
| 28 | +source llm-env/bin/activate |
| 29 | +``` |
| 30 | +<!-- @os:end --> |
| 31 | + |
| 32 | +### Installing Basic Dependencies |
| 33 | +<!-- @require:pytorch --> |
| 34 | + |
| 35 | +### Additional Dependencies |
| 36 | + |
| 37 | +```bash |
| 38 | +pip install transformers accelerate sentencepiece protobuf |
| 39 | +``` |
| 40 | + |
| 41 | +## Quick Start with Example Scripts |
| 42 | + |
| 43 | +This playbook includes ready-to-use scripts in the `assets/` folder (click to preview): |
| 44 | + |
| 45 | +| Script | Description | Usage | |
| 46 | +|--------|-------------|-------| |
| 47 | +| [run_llm.py](assets/run_llm.py) | Basic LLM text generation | `python run_llm.py` | |
| 48 | +| [summarizer.py](assets/summarizer.py) | Document summarizer with Harmony support | `python summarizer.py --file document.txt` | |
| 49 | + |
| 50 | +Both scripts support: |
| 51 | +- Model selection: `--model gptoss` (default) or `--model mistral` |
| 52 | +- Chat template formatting for proper model prompting especially useful for document summarization |
| 53 | + |
| 54 | +## Loading and Running Your First LLM |
| 55 | + |
| 56 | +The included [run_llm.py](assets/run_llm.py) script shows how to load and generate text with LLMs using PyTorch and AMD ROCm. On the first run, model weights are automatically downloaded. |
| 57 | + |
| 58 | +Take a look at how prompts are tokenized and sent to the model. Understanding this process lets you adapt LLMs for any text generation or summarization task. Here’s a minimal example from the script: |
| 59 | + |
| 60 | +```python |
| 61 | +import torch |
| 62 | +from transformers import AutoTokenizer, AutoModelForCausalLM |
| 63 | + |
| 64 | +model_name = "openai/gpt-oss-20b" |
| 65 | +tokenizer = AutoTokenizer.from_pretrained(model_name) |
| 66 | +model = AutoModelForCausalLM.from_pretrained( |
| 67 | + model_name, |
| 68 | + torch_dtype=torch.bfloat16, |
| 69 | + device_map="auto" |
| 70 | +) |
| 71 | +``` |
| 72 | + |
| 73 | +To try it out: |
| 74 | + |
| 75 | +```bash |
| 76 | +python run_llm.py |
| 77 | +``` |
| 78 | + |
| 79 | +## Building a Document Summarizer |
| 80 | + |
| 81 | +Build on your LLM setup by turning it into a practical document summarizer. In this section, you will use the [summarizer.py](assets/summarizer.py) script to feed in a .txt file and automatically generate a concise summary, all running locally on your GPU. |
| 82 | + |
| 83 | +The script is designed to work out of the box: point it at a text file, pick a model, and it returns a clear 2–3 sentence overview. As you explore the code, you can customize prompts, tweak parameters like length and temperature, and see how different models behave. |
| 84 | + |
| 85 | +### Usage Examples |
| 86 | + |
| 87 | +```bash |
| 88 | + |
| 89 | +# Summarize document |
| 90 | +python summarizer.py |
| 91 | + |
| 92 | +# Summarize a text file |
| 93 | +python summarizer.py --file example_document.txt |
| 94 | + |
| 95 | +# Adjust creativity with temperature |
| 96 | +python summarizer.py --file document.txt --temperature 0.5 |
| 97 | + |
| 98 | +# Try different model instead |
| 99 | +python summarizer.py --file document.txt --model mistral |
| 100 | + |
| 101 | +# Longer summaries with more tokens |
| 102 | +python summarizer.py --file document.txt --max-length 200 |
| 103 | +``` |
| 104 | + |
| 105 | +## Generation Parameters |
| 106 | + |
| 107 | +| Parameter | What It Controls | Typical Values | |
| 108 | +|-----------|------------------|----------------| |
| 109 | +| `max_new_tokens` | Length of output | 50–500 for summaries | |
| 110 | +| `temperature` | Randomness/creativity | 0.2–0.3 for summaries, 0.7–0.9 for creative tasks | |
| 111 | +| `top_p` | Nucleus sampling | 0.9 (standard) | |
| 112 | + |
| 113 | +**Temperature Guide**: |
| 114 | +- 0.1–0.3: Focused, deterministic (good for summaries) |
| 115 | +- 0.5–0.7: Balanced (general use) |
| 116 | +- 0.8–1.0: Creative, varied (brainstorming) |
| 117 | + |
| 118 | +## Real-World Applications |
| 119 | + |
| 120 | +- **Research Paper Analysis**: Extract key findings from complex publications for quick review |
| 121 | +- **News Aggregation**: Summarize news articles into brief daily digests or highlights |
| 122 | +- **Meeting Notes**: Condense transcripts into actionable items and concise summaries |
| 123 | +- **Legal Document Review**: Extract relevant clauses or obligations from long legal texts quickly |
| 124 | +- **Code Documentation**: Generate concise repository overviews and function explanations |
| 125 | + |
| 126 | +## Next Steps |
| 127 | + |
| 128 | +- **Fine-tuning**: Adapt models to your specific field or jargon for better accuracy (see PyTorch Fine-tuning Playbook) |
| 129 | +- **RAG Systems**: Combine LLMs with document retrieval for context-aware answers and search |
| 130 | +- **Model Exploration**: Experiment with new models like Llama 3, Phi-3, or Qwen for better results |
| 131 | +- **Production Deployment**: Use tools like vLLM or TGI for scalable LLM serving in organizations |
| 132 | + |
| 133 | +Your STX Halo gives you the power to run sophisticated language models locally. Experiment with different models, prompts, and parameters to discover what works best for your applications. |
0 commit comments