Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,6 @@ dmypy.json
.cache/

.DS_Store

AGENTS.md
CLAUDE.md
141 changes: 138 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,24 @@ Why xTuring:
pip install xturing
```

### Development Installation

If you want to contribute to xTuring or run from source:

```bash
# Clone the repository
git clone https://github.com/stochasticai/xturing.git
cd xturing

# Install in editable mode with development dependencies
pip install -e .
pip install -r requirements-dev.txt

# Set up pre-commit hooks (required before contributing)
pre-commit install
pre-commit install --hook-type commit-msg
```

<br>

## 🚀 Quickstart
Expand Down Expand Up @@ -158,7 +176,7 @@ dataset = InstructionDataset('../llama/alpaca_data')
model = GenericLoraKbitModel('tiiuae/falcon-7b')

# Generate outputs on desired prompts
outputs = model.generate(dataset = dataset, batch_size=10)
outputs = model.generate(dataset = dataset, batch_size=10)

```

Expand All @@ -173,6 +191,16 @@ model.finetune(dataset=dataset)
```
> See `examples/models/qwen3/qwen3_lora_finetune.py` for a runnable script.

8. __Qwen3-Omni dataset generation__ – Run the multimodal checkpoint locally (download from Hugging Face) to bootstrap instruction corpora without leaving your machine.
```python
from xturing.datasets import InstructionDataset
from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI

# Download `Qwen/Qwen2.5-Omni` (or another HF variant) ahead of time
engine = Qwen3OmniTextGenerationAPI(model_name_or_path="Qwen/Qwen2.5-Omni")
dataset = InstructionDataset.generate_dataset("./tasks.jsonl", engine=engine)
```

An exploration of the [Llama LoRA INT4 working example](examples/features/int4_finetuning/LLaMA_lora_int4.ipynb) is recommended for an understanding of its application.

For an extended insight, consider examining the [GenericModel working example](examples/features/generic/generic_model.py) available in the repository.
Expand All @@ -182,9 +210,17 @@ For an extended insight, consider examining the [GenericModel working example](e
## CLI playground
<img src=".github/cli-playground.gif" width="80%" style="margin: 0 1%;"/>

The `xturing` CLI provides interactive tools for working with fine-tuned models:

```bash
$ xturing chat -m "<path-to-model-folder>"
# Chat with a fine-tuned model
xturing chat -m "<path-to-model-folder>"

# Launch the UI playground (alternative to programmatic Playground)
xturing ui

# Get help and see all available commands
xturing --help
```

## UI playground
Expand Down Expand Up @@ -250,13 +286,27 @@ Contribute to this by submitting your performance results on other GPUs by creat

## 📎 Fine‑tuned model checkpoints
We have already fine-tuned some models that you can use as your base or start playing with.
Here is how you would load them:

### Loading Models

**Load from xTuring hub:**
```python
from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")
```

**Load from local directory:**
```python
model = BaseModel.load("/path/to/saved/model")
```

**Create a new model for fine-tuning:**
```python
model = BaseModel.create("llama_lora")
```

### Available Pre-trained Models

| model | dataset | Path |
|---------------------|--------|---------------|
| DistilGPT-2 LoRA | alpaca | `x/distilgpt2_lora_finetuned_alpaca` |
Expand All @@ -281,6 +331,7 @@ Below is a list of all the supported models via `BaseModel` class of `xTuring` a
|LLaMA2 | llama2|
|MiniMaxM2 | minimax_m2|
|OPT-1.3B | opt|
|Qwen3-0.6B | qwen3_0_6b|

The above are the base variants. Use these templates for `LoRA`, `INT8`, and `INT8 + LoRA` versions:

Expand Down Expand Up @@ -314,17 +365,101 @@ Replace `<model_path>` with a local directory or a Hugging Face model like `face

<br>

## 🧪 Running Tests

The project uses pytest for testing. Test files are located in the `tests/` directory.

Run all tests:
```bash
pytest
```

Run a specific test file:
```bash
pytest tests/xturing/models/test_qwen_model.py
```

Skip slow tests:
```bash
pytest -m "not slow"
```

Skip GPU tests (for CPU-only environments):
```bash
pytest -m "not gpu"
```

Test markers used in this project:
- `@pytest.mark.slow` - Tests that take significant time to run
- `@pytest.mark.gpu` - Tests requiring GPU hardware

<br>

## 🤝 Help and Support
If you have any questions, you can create an issue on this repository.

You can also join our [Discord server](https://discord.gg/TgHXuSJEk6) and start a discussion in the `#xturing` channel.

<br>

## 🏗️ Project Structure

Understanding the codebase organization:

```
src/xturing/
├── models/ # Model classes and registry (BaseModel, LLaMA, GPT-2, etc.)
├── engines/ # Low-level model loading, tokenization, and operations
├── datasets/ # Dataset loaders (InstructionDataset, TextDataset)
├── trainers/ # Training loops (LightningTrainer with DeepSpeed support)
├── preprocessors/ # Data preprocessing and tokenization
├── config/ # YAML configurations for finetuning and generation
├── cli/ # CLI commands (chat, ui, api)
├── ui/ # Gradio UI playground
├── self_instruct/ # Dataset generation utilities
└── utils/ # Shared utilities

tests/xturing/ # Test suite mirroring src structure
examples/ # Example scripts organized by model and feature
```

**Key architectural patterns:**
- **Registry Pattern**: Models and engines use a registry-based factory pattern via `BaseModel.create()` and `BaseEngine.create()`
- **Model Variants**: Each model family has multiple variants following the naming template `<base>_[lora]_[int8|kbit]`
- Example: `llama`, `llama_lora`, `llama_int8`, `llama_lora_int8`
- **Configuration**: Training and generation parameters are defined in YAML files per model in `src/xturing/config/`
- **Engines**: Handle the low-level operations (loading weights, tokenization, DeepSpeed integration)
- **Models**: Provide high-level API (`finetune()`, `generate()`, `evaluate()`, `save()`, `load()`)

<br>

## 📝 License
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

<br>

## 🌎 Contributing
As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our [contributing guide](CONTRIBUTING.md) to learn how you can get involved.

### Quick Contribution Guidelines

**Important:** All pull requests should target the `dev` branch, not `main`.

The project uses pre-commit hooks to enforce code quality:
- **black** - Code formatting
- **isort** - Import sorting (black profile)
- **autoflake** - Remove unused imports
- **absolufy-imports** - Convert relative to absolute imports
- **gitlint** - Commit message linting

You can manually format code:
```bash
black src/ tests/
isort src/ tests/
```

Pre-commit hooks will automatically run these checks when you commit. Make sure to install them:
```bash
pre-commit install
pre-commit install --hook-type commit-msg
```
10 changes: 10 additions & 0 deletions docs/docs/advanced/generate.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,16 @@ engine = Davinci("your-api-key")
engine = ClaudeSonnet("your-api-key")
```

</TabItem>
<TabItem value="qwen" label="Qwen3-Omni (local)">

Download the desired checkpoint from [Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Omni) (or point to a local directory) and load it directly.

```python
from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI
engine = Qwen3OmniTextGenerationAPI(model_name_or_path="Qwen/Qwen2.5-Omni")
```

</TabItem>
</Tabs>

Expand Down
4 changes: 4 additions & 0 deletions src/xturing/model_apis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from xturing.model_apis.openai import ChatGPT as OpenAIChatGPT
from xturing.model_apis.openai import Davinci as OpenAIDavinci
from xturing.model_apis.openai import OpenAITextGenerationAPI
from xturing.model_apis.qwen import Qwen3OmniTextGenerationAPI

BaseApi.add_to_registry(OpenAITextGenerationAPI.config_name, OpenAITextGenerationAPI)
BaseApi.add_to_registry(CohereTextGenerationAPI.config_name, CohereTextGenerationAPI)
Expand All @@ -13,3 +14,6 @@
BaseApi.add_to_registry(OpenAIChatGPT.config_name, OpenAIChatGPT)
BaseApi.add_to_registry(CohereMedium.config_name, CohereMedium)
BaseApi.add_to_registry(ClaudeSonnet.config_name, ClaudeSonnet)
BaseApi.add_to_registry(
Qwen3OmniTextGenerationAPI.config_name, Qwen3OmniTextGenerationAPI
)
146 changes: 146 additions & 0 deletions src/xturing/model_apis/qwen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
from datetime import datetime
from typing import Dict, List, Optional, Sequence

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from xturing.model_apis.base import TextGenerationAPI


class Qwen3OmniTextGenerationAPI(TextGenerationAPI):
"""Text generation API wrapper for running Qwen3-Omni locally via Hugging Face."""

config_name = "qwen3_omni"

def __init__(
self,
model_name_or_path: str = "Qwen/Qwen2.5-Omni",
device: Optional[str] = None,
tokenizer_kwargs: Optional[Dict] = None,
model_kwargs: Optional[Dict] = None,
default_generate_kwargs: Optional[Dict] = None,
):
super().__init__(
engine=model_name_or_path,
api_key=None,
request_batch_size=1,
)
tokenizer_kwargs = tokenizer_kwargs or {}
model_kwargs = model_kwargs or {}
self.default_generate_kwargs = default_generate_kwargs or {}

self.tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path, trust_remote_code=True, **tokenizer_kwargs
)
self.model = AutoModelForCausalLM.from_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look correct. I don't think AutoModelForCausalLM is supported for this model. Just try AutoModelForMultimodalLM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the suggested changes.

model_name_or_path, trust_remote_code=True, **model_kwargs
)

if device is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
self.device = torch.device(device)
self.model.to(self.device)
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token
self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

def _trim_stop_sequences(
self, text: str, stop_sequences: Optional[Sequence[str]]
) -> str:
if not stop_sequences:
return text
cut_index = len(text)
for stop in stop_sequences:
if not stop:
continue
idx = text.find(stop)
if idx != -1 and idx < cut_index:
cut_index = idx
return text[:cut_index].rstrip()

def _generate_single(
self,
prompt: str,
max_tokens: int,
temperature: float,
top_p: Optional[float],
stop_sequences: Optional[Sequence[str]],
n: int,
generation_overrides: Dict,
) -> List[Dict[str, str]]:
inputs = self.tokenizer(
prompt,
return_tensors="pt",
)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
do_sample = temperature is not None and temperature > 0
generate_kwargs = {
"max_new_tokens": max_tokens,
"do_sample": do_sample,
"num_return_sequences": n,
"eos_token_id": self.tokenizer.eos_token_id,
"pad_token_id": self.tokenizer.pad_token_id,
}
if temperature is not None:
generate_kwargs["temperature"] = temperature
if top_p is not None:
generate_kwargs["top_p"] = top_p
generate_kwargs.update(self.default_generate_kwargs)
generate_kwargs.update(generation_overrides)
outputs = self.model.generate(**inputs, **generate_kwargs)
if n == 1:
outputs = outputs.unsqueeze(0) if outputs.dim() == 1 else outputs
generated_sequences: List[Dict[str, str]] = []
prompt_length = inputs["input_ids"].shape[-1]
for sequence in outputs:
completion_tokens = sequence[prompt_length:]
text = self.tokenizer.decode(
completion_tokens,
skip_special_tokens=True,
).strip()
text = self._trim_stop_sequences(text, stop_sequences)
generated_sequences.append(
{
"text": text,
"finish_reason": "stop",
}
)
return generated_sequences

def generate_text(
self,
prompts,
max_tokens,
temperature,
top_p=None,
frequency_penalty=None,
presence_penalty=None,
stop_sequences=None,
logprobs=None,
n=1,
best_of=1,
retries=0,
**generation_overrides,
):
if not isinstance(prompts, list):
prompts = [prompts]

results = []
for prompt in prompts:
choices = self._generate_single(
prompt=prompt,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
stop_sequences=stop_sequences,
n=n,
generation_overrides=generation_overrides,
)
data = {
"prompt": prompt,
"response": {"choices": choices},
"created_at": str(datetime.now()),
}
results.append(data)

return results
Loading