feat: add local Qwen3-Omni API and fine-tune regression

glennko · glennko · commit 5c816ba31ed3 · 2026-02-04T22:28:04.000-05:00
* swap Qwen3-Omni dataset generation to load Hugging Face checkpoints locally and wire up tests/
  docs
  * ensure AGENTS.md/CLAUDE.md stay untracked
  * add a lightweight GPT-2 fine-tuning test that asserts weights actually change
diff --git a/.gitignore b/.gitignore
@@ -148,3 +148,6 @@ dmypy.json
 .cache/
 
 .DS_Store
+
+AGENTS.md
+CLAUDE.md
diff --git a/README.md b/README.md
@@ -36,6 +36,24 @@ Why xTuring:
 pip install xturing
 ```
 
+### Development Installation
+
+If you want to contribute to xTuring or run from source:
+
+```bash
+# Clone the repository
+git clone https://github.com/stochasticai/xturing.git
+cd xturing
+
+# Install in editable mode with development dependencies
+pip install -e .
+pip install -r requirements-dev.txt
+
+# Set up pre-commit hooks (required before contributing)
+pre-commit install
+pre-commit install --hook-type commit-msg
+```
+
 <br>
 
 ## 🚀 Quickstart
@@ -158,7 +176,7 @@ dataset = InstructionDataset('../llama/alpaca_data')
 model = GenericLoraKbitModel('tiiuae/falcon-7b')
 
 # Generate outputs on desired prompts
-outputs = model.generate(dataset = dataset, batch_size=10)
+ outputs = model.generate(dataset = dataset, batch_size=10)
 
 ```
 
@@ -173,6 +191,16 @@ model.finetune(dataset=dataset)
 ```
 > See `examples/models/qwen3/qwen3_lora_finetune.py` for a runnable script.
 
+8. __Qwen3-Omni dataset generation__ – Run the multimodal checkpoint locally (download from Hugging Face) to bootstrap instruction corpora without leaving your machine.
+```python
+from xturing.datasets import InstructionDataset
+from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI
+
+# Download `Qwen/Qwen2.5-Omni` (or another HF variant) ahead of time
+engine = Qwen3OmniTextGenerationAPI(model_name_or_path="Qwen/Qwen2.5-Omni")
+dataset = InstructionDataset.generate_dataset("./tasks.jsonl", engine=engine)
+```
+
 An exploration of the [Llama LoRA INT4 working example](examples/features/int4_finetuning/LLaMA_lora_int4.ipynb) is recommended for an understanding of its application.
 
 For an extended insight, consider examining the [GenericModel working example](examples/features/generic/generic_model.py) available in the repository.
@@ -182,9 +210,17 @@ For an extended insight, consider examining the [GenericModel working example](e
 ## CLI playground
 <img src=".github/cli-playground.gif" width="80%" style="margin: 0 1%;"/>
 
+The `xturing` CLI provides interactive tools for working with fine-tuned models:
+
 ```bash
-$ xturing chat -m "<path-to-model-folder>"
+# Chat with a fine-tuned model
+xturing chat -m "<path-to-model-folder>"
+
+# Launch the UI playground (alternative to programmatic Playground)
+xturing ui
 
+# Get help and see all available commands
+xturing --help
 ```
 
 ## UI playground
@@ -250,13 +286,27 @@ Contribute to this by submitting your performance results on other GPUs by creat
 
 ## 📎 Fine‑tuned model checkpoints
 We have already fine-tuned some models that you can use as your base or start playing with.
-Here is how you would load them:
 
+### Loading Models
+
+**Load from xTuring hub:**
 ```python
 from xturing.models import BaseModel
 model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")
 ```
 
+**Load from local directory:**
+```python
+model = BaseModel.load("/path/to/saved/model")
+```
+
+**Create a new model for fine-tuning:**
+```python
+model = BaseModel.create("llama_lora")
+```
+
+### Available Pre-trained Models
+
 | model               | dataset | Path          |
 |---------------------|--------|---------------|
 | DistilGPT-2 LoRA | alpaca | `x/distilgpt2_lora_finetuned_alpaca` |
@@ -281,6 +331,7 @@ Below is a list of all the supported models via `BaseModel` class of `xTuring` a
 |LLaMA2 | llama2|
 |MiniMaxM2 | minimax_m2|
 |OPT-1.3B | opt|
+|Qwen3-0.6B | qwen3_0_6b|
 
 The above are the base variants. Use these templates for `LoRA`, `INT8`, and `INT8 + LoRA` versions:
 
@@ -314,17 +365,101 @@ Replace `<model_path>` with a local directory or a Hugging Face model like `face
 
 <br>
 
+## 🧪 Running Tests
+
+The project uses pytest for testing. Test files are located in the `tests/` directory.
+
+Run all tests:
+```bash
+pytest
+```
+
+Run a specific test file:
+```bash
+pytest tests/xturing/models/test_qwen_model.py
+```
+
+Skip slow tests:
+```bash
+pytest -m "not slow"
+```
+
+Skip GPU tests (for CPU-only environments):
+```bash
+pytest -m "not gpu"
+```
+
+Test markers used in this project:
+- `@pytest.mark.slow` - Tests that take significant time to run
+- `@pytest.mark.gpu` - Tests requiring GPU hardware
+
+<br>
+
 ## 🤝 Help and Support
 If you have any questions, you can create an issue on this repository.
 
 You can also join our [Discord server](https://discord.gg/TgHXuSJEk6) and start a discussion in the `#xturing` channel.
 
 <br>
 
+## 🏗️ Project Structure
+
+Understanding the codebase organization:
+
+```
+src/xturing/
+├── models/          # Model classes and registry (BaseModel, LLaMA, GPT-2, etc.)
+├── engines/         # Low-level model loading, tokenization, and operations
+├── datasets/        # Dataset loaders (InstructionDataset, TextDataset)
+├── trainers/        # Training loops (LightningTrainer with DeepSpeed support)
+├── preprocessors/   # Data preprocessing and tokenization
+├── config/          # YAML configurations for finetuning and generation
+├── cli/             # CLI commands (chat, ui, api)
+├── ui/              # Gradio UI playground
+├── self_instruct/   # Dataset generation utilities
+└── utils/           # Shared utilities
+
+tests/xturing/       # Test suite mirroring src structure
+examples/            # Example scripts organized by model and feature
+```
+
+**Key architectural patterns:**
+- **Registry Pattern**: Models and engines use a registry-based factory pattern via `BaseModel.create()` and `BaseEngine.create()`
+- **Model Variants**: Each model family has multiple variants following the naming template `<base>_[lora]_[int8|kbit]`
+  - Example: `llama`, `llama_lora`, `llama_int8`, `llama_lora_int8`
+- **Configuration**: Training and generation parameters are defined in YAML files per model in `src/xturing/config/`
+- **Engines**: Handle the low-level operations (loading weights, tokenization, DeepSpeed integration)
+- **Models**: Provide high-level API (`finetune()`, `generate()`, `evaluate()`, `save()`, `load()`)
+
+<br>
+
 ## 📝 License
 This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
 
 <br>
 
 ## 🌎 Contributing
 As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our [contributing guide](CONTRIBUTING.md) to learn how you can get involved.
+
+### Quick Contribution Guidelines
+
+**Important:** All pull requests should target the `dev` branch, not `main`.
+
+The project uses pre-commit hooks to enforce code quality:
+- **black** - Code formatting
+- **isort** - Import sorting (black profile)
+- **autoflake** - Remove unused imports
+- **absolufy-imports** - Convert relative to absolute imports
+- **gitlint** - Commit message linting
+
+You can manually format code:
+```bash
+black src/ tests/
+isort src/ tests/
+```
+
+Pre-commit hooks will automatically run these checks when you commit. Make sure to install them:
+```bash
+pre-commit install
+pre-commit install --hook-type commit-msg
+```
diff --git a/docs/docs/advanced/generate.md b/docs/docs/advanced/generate.md
@@ -41,6 +41,16 @@ engine = Davinci("your-api-key")
   engine = ClaudeSonnet("your-api-key")
   ```
 
+  </TabItem>
+  <TabItem value="qwen" label="Qwen3-Omni (local)">
+
+  Download the desired checkpoint from [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Omni) (or point to a local directory) and load it directly.
+
+  ```python
+  from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI
+  engine = Qwen3OmniTextGenerationAPI(model_name_or_path="Qwen/Qwen2.5-Omni")
+  ```
+
   </TabItem>
 </Tabs>
 
diff --git a/src/xturing/model_apis/__init__.py b/src/xturing/model_apis/__init__.py
@@ -5,6 +5,7 @@
 from xturing.model_apis.openai import ChatGPT as OpenAIChatGPT
 from xturing.model_apis.openai import Davinci as OpenAIDavinci
 from xturing.model_apis.openai import OpenAITextGenerationAPI
+from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI
 
 BaseApi.add_to_registry(OpenAITextGenerationAPI.config_name, OpenAITextGenerationAPI)
 BaseApi.add_to_registry(CohereTextGenerationAPI.config_name, CohereTextGenerationAPI)
@@ -13,3 +14,6 @@
 BaseApi.add_to_registry(OpenAIChatGPT.config_name, OpenAIChatGPT)
 BaseApi.add_to_registry(CohereMedium.config_name, CohereMedium)
 BaseApi.add_to_registry(ClaudeSonnet.config_name, ClaudeSonnet)
+BaseApi.add_to_registry(
+    Qwen3OmniTextGenerationAPI.config_name, Qwen3OmniTextGenerationAPI
+)
diff --git a/src/xturing/model_apis/qwen.py b/src/xturing/model_apis/qwen.py
@@ -0,0 +1,144 @@
+from datetime import datetime
+from typing import Dict, List, Optional, Sequence
+
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+from xturing.model_apis.base import TextGenerationAPI
+
+
+class Qwen3OmniTextGenerationAPI(TextGenerationAPI):
+    """Text generation API wrapper for running Qwen3-Omni locally via Hugging Face."""
+
+    config_name = "qwen3_omni"
+
+    def __init__(
+        self,
+        model_name_or_path: str = "Qwen/Qwen2.5-Omni",
+        device: Optional[str] = None,
+        tokenizer_kwargs: Optional[Dict] = None,
+        model_kwargs: Optional[Dict] = None,
+        default_generate_kwargs: Optional[Dict] = None,
+    ):
+        super().__init__(
+            engine=model_name_or_path,
+            api_key=None,
+            request_batch_size=1,
+        )
+        tokenizer_kwargs = tokenizer_kwargs or {}
+        model_kwargs = model_kwargs or {}
+        self.default_generate_kwargs = default_generate_kwargs or {}
+
+        self.tokenizer = AutoTokenizer.from_pretrained(
+            model_name_or_path, trust_remote_code=True, **tokenizer_kwargs
+        )
+        self.model = AutoModelForCausalLM.from_pretrained(
+            model_name_or_path, trust_remote_code=True, **model_kwargs
+        )
+
+        if device is None:
+            device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.device = torch.device(device)
+        self.model.to(self.device)
+        if self.tokenizer.pad_token is None:
+            self.tokenizer.pad_token = self.tokenizer.eos_token
+            self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
+
+    def _trim_stop_sequences(self, text: str, stop_sequences: Optional[Sequence[str]]) -> str:
+        if not stop_sequences:
+            return text
+        cut_index = len(text)
+        for stop in stop_sequences:
+            if not stop:
+                continue
+            idx = text.find(stop)
+            if idx != -1 and idx < cut_index:
+                cut_index = idx
+        return text[:cut_index].rstrip()
+
+    def _generate_single(
+        self,
+        prompt: str,
+        max_tokens: int,
+        temperature: float,
+        top_p: Optional[float],
+        stop_sequences: Optional[Sequence[str]],
+        n: int,
+        generation_overrides: Dict,
+    ) -> List[Dict[str, str]]:
+        inputs = self.tokenizer(
+            prompt,
+            return_tensors="pt",
+        )
+        inputs = {k: v.to(self.device) for k, v in inputs.items()}
+        do_sample = temperature is not None and temperature > 0
+        generate_kwargs = {
+            "max_new_tokens": max_tokens,
+            "do_sample": do_sample,
+            "num_return_sequences": n,
+            "eos_token_id": self.tokenizer.eos_token_id,
+            "pad_token_id": self.tokenizer.pad_token_id,
+        }
+        if temperature is not None:
+            generate_kwargs["temperature"] = temperature
+        if top_p is not None:
+            generate_kwargs["top_p"] = top_p
+        generate_kwargs.update(self.default_generate_kwargs)
+        generate_kwargs.update(generation_overrides)
+        outputs = self.model.generate(**inputs, **generate_kwargs)
+        if n == 1:
+            outputs = outputs.unsqueeze(0) if outputs.dim() == 1 else outputs
+        generated_sequences: List[Dict[str, str]] = []
+        prompt_length = inputs["input_ids"].shape[-1]
+        for sequence in outputs:
+            completion_tokens = sequence[prompt_length:]
+            text = self.tokenizer.decode(
+                completion_tokens,
+                skip_special_tokens=True,
+            ).strip()
+            text = self._trim_stop_sequences(text, stop_sequences)
+            generated_sequences.append(
+                {
+                    "text": text,
+                    "finish_reason": "stop",
+                }
+            )
+        return generated_sequences
+
+    def generate_text(
+        self,
+        prompts,
+        max_tokens,
+        temperature,
+        top_p=None,
+        frequency_penalty=None,
+        presence_penalty=None,
+        stop_sequences=None,
+        logprobs=None,
+        n=1,
+        best_of=1,
+        retries=0,
+        **generation_overrides,
+    ):
+        if not isinstance(prompts, list):
+            prompts = [prompts]
+
+        results = []
+        for prompt in prompts:
+            choices = self._generate_single(
+                prompt=prompt,
+                max_tokens=max_tokens,
+                temperature=temperature,
+                top_p=top_p,
+                stop_sequences=stop_sequences,
+                n=n,
+                generation_overrides=generation_overrides,
+            )
+            data = {
+                "prompt": prompt,
+                "response": {"choices": choices},
+                "created_at": str(datetime.now()),
+            }
+            results.append(data)
+
+        return results
diff --git a/tests/xturing/model_apis/test_qwen_api.py b/tests/xturing/model_apis/test_qwen_api.py
diff --git a/tests/xturing/models/test_gpt2_model.py b/tests/xturing/models/test_gpt2_model.py