A FastAPI-based service that generates character images using Stable Diffusion XL. This service provides a simple API endpoint for generating images based on text prompts, with optional reference images for style guidance and LoRA model support for character customization.
- Docker and Docker Compose
- NVIDIA GPU with CUDA support
- At least 16GB of GPU memory recommended
- At least 20GB of disk space for models
- Clone the repository:
git clone [your-repo-url]
cd character-generator- Run the setup script:
chmod +x setup.sh
./setup.shThis will:
- Install required Python packages
- Download necessary AI models
- Build and start the Docker container
The project uses environment variables for configuration. A .env.example file is provided as a template:
- Copy the example file to create your own
.envfile:
cp .env.example .env- Edit the
.envfile to set your own values:
nano .env- Replace the placeholder values with your actual configuration:
# Database Configuration
MYSQL_ROOT_PASSWORD=your_secure_root_password
MYSQL_PASSWORD=your_secure_user_password
# Hugging Face Token
HUGGINGFACE_TOKEN=your_huggingface_token
# MinIO Configuration
MINIO_ROOT_PASSWORD=your_minio_password
# NCA Toolkit Configuration
API_KEY=your_api_key
S3_ACCESS_KEY=your_s3_access_key
S3_SECRET_KEY=your_s3_secret_keyThe .env file is already in the .gitignore to prevent accidental commits of sensitive information.
This project uses environment variables to manage sensitive information like API keys and tokens. To ensure your sensitive data is not accidentally committed to version control:
-
Use the .env file for sensitive information:
- The project includes a
.envfile for storing sensitive information - This file is already in the
.gitignoreto prevent accidental commits - Add your sensitive tokens and keys to this file
- The project includes a
-
Required sensitive variables:
HUGGINGFACE_TOKEN: Your Hugging Face API token for model downloadsMYSQL_ROOT_PASSWORD: Database root passwordMYSQL_PASSWORD: Database user password
-
GitHub Secret Scanning:
- GitHub's secret scanning will detect and block pushes containing sensitive information
- If you need to push code with example tokens (for documentation), use placeholder values like
YOUR_TOKEN_HERE
The following services are exposed on these ports:
| Service | Port | Description |
|---|---|---|
| Character Generator | 2035 | Main API for character generation |
| n8n | 2678 | Workflow automation platform |
| MinIO | 2000, 2001 | Object storage (API, Console) |
| Kokoro TTS | 2880 | Text-to-speech service |
| MariaDB | 2306 | Database service |
| NCA Toolkit | 2080 | No-code toolkit interface |
| Ollama | 2030 | LLM service |
| Weaviate | 2500 | Vector database |
The API provides two health check endpoints:
- Basic health check:
curl http://localhost:8000/Expected response:
{
"status": "ok",
"message": "Character Generator API is running. Model status: initialized"
}- Detailed health check:
curl http://localhost:8000/healthExpected response:
{
"status": "healthy"
}The API follows a specific workflow for creating and training characters:
-
Initial Character Creation
- Create from description:
curl -X POST "http://localhost:8000/characters/create_initial" \ -H "Content-Type: application/json" \ -d '{"prompt": "a noble elven warrior with golden armor"}'
- Create from existing image:
curl -X POST "http://localhost:8000/characters/create_initial" \ -F "prompt=a noble elven warrior with golden armor" \ -F "[email protected]"
- Regenerate if not satisfied:
curl -X POST "http://localhost:8000/characters/create_initial" \ -H "Content-Type: application/json" \ -d '{ "prompt": "a noble elven warrior with golden armor", "regenerate": true, "character_id": "YOUR_CHARACTER_ID" }'
-
Approve Base Image
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/approve" -
Generate Training Data
- Generate default variations:
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_training" \ -H "Content-Type: application/json" \ -d '{"num_variations": 10}'
- Generate with custom prompts:
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_training" \ -H "Content-Type: application/json" \ -d '{ "num_variations": 5, "custom_prompts": [ "character in battle pose", "character casting a spell", "character riding a horse", "character in formal attire", "character in stealth mode" ] }'
-
Manage Training Images
- View a training image:
curl "http://localhost:8000/characters/YOUR_CHARACTER_ID/training/0" --output training_0.png- Regenerate a specific training image:
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/training/0/regenerate" \ -H "Content-Type: application/json" \ -d '{"custom_prompt": "character in a different battle pose"}'
- Remove unwanted training image:
curl -X DELETE "http://localhost:8000/characters/YOUR_CHARACTER_ID/training/0" -
Train LoRA Model Once you have a satisfactory set of training images:
# Start training curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/train"
Check training status:
curl "http://localhost:8000/characters/YOUR_CHARACTER_ID/training_status"Example response:
{ "status": "success", "training_status": { "state": "training", "progress": 45.5, "training_start": "2024-03-14T10:30:00", "last_update": "2024-03-14T10:35:00" } } -
Generate New Scenes Once training is complete (state is "ready"), generate new scenes with your character:
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \ -H "Content-Type: application/json" \ -d '{ "prompt": "the character exploring an ancient temple", "num_inference_steps": 30, "guidance_scale": 7.5 }' \ --output scene.png
Characters progress through several states during creation and training:
initial- Just created, no approved base imagebase_approved- Has approved base image, ready for training data generationgenerating_training- Currently generating training data variationstraining- LoRA model training in progressready- LoRA trained and ready for scene generationerror- Something went wrong (check error message in status)
The LoRA training process can be configured through environment variables:
# LoRA Training Configuration
LORA_RANK=16 # Rank of LoRA matrices
LORA_ALPHA=32 # LoRA scaling factor
NUM_TRAIN_EPOCHS=100 # Number of training epochsThe training process requires:
- At least 5 training images
- CUDA-capable GPU with sufficient memory
- Training time varies based on number of images and epochs
When generating new scenes with a trained character:
- Always include distinctive features from the original character description
- Be specific about the scene and character's pose/action
- Use the same style keywords as in the original description for consistency
Example prompts:
# Action scene
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
-d '{"prompt": "the noble elven warrior with golden armor in an epic battle stance, wielding a glowing sword, dramatic lighting"}'
# Portrait scene
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
-d '{"prompt": "close up portrait of the noble elven warrior with golden armor, serene expression, detailed face features"}'
# Environmental scene
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
-d '{"prompt": "the noble elven warrior with golden armor standing in a mystical elven forest, ethereal atmosphere"}'The API supports character management using LoRA (Low-Rank Adaptation) models for consistent character generation:
Create a new character profile with associated LoRA model:
curl -X POST "http://localhost:8000/characters/create" \
-H "Content-Type: application/json" \
-d '{
"name": "elf_warrior",
"description": "A noble elven warrior with golden armor",
"training_images": ["base_image1.png", "base_image2.png"],
"lora_config": {
"r": 16,
"alpha": 32,
"target_modules": ["q_proj", "v_proj"]
}
}'Generate an image using a specific character's LoRA model:
curl -X POST "http://localhost:8000/characters/generate" \
-H "Content-Type: application/json" \
-d '{
"character_name": "elf_warrior",
"prompt": "the character in a battle pose",
"num_inference_steps": 30,
"guidance_scale": 7.5
}' \
--output character.pngGenerate an image using just a prompt:
curl -X POST "http://localhost:8000/generate?prompt=a%20beautiful%20fantasy%20character" \
--output character.pngGenerate with all parameters and a reference image:
curl -X POST "http://localhost:8000/generate" \
-F "prompt=a beautiful fantasy character with long flowing hair" \
-F "[email protected]" \
-F "num_inference_steps=30" \
-F "guidance_scale=7.5" \
-F "strength=0.8" \
--output character.pngThe API doesn't store generated images - they are saved locally where you make the API call. To use a generated image as a reference:
- First generate and save an image:
# Generate first image
curl -X POST "http://localhost:8000/generate?prompt=elf warrior" \
--output first_character.png- Then use that saved image as a reference for a new generation:
# Use first_character.png as reference for new generation
curl -X POST "http://localhost:8000/generate" \
-F "prompt=elf warrior with different pose" \
-F "reference_image=@first_character.png" \
-F "strength=0.8" \
--output second_character.pngThe strength parameter controls how much influence the reference image has on the final result:
- Higher values (closer to 1.0) preserve more of the reference image's style and composition
- Lower values (closer to 0.0) allow more deviation from the reference
import requests
from PIL import Image
import io
def generate_character(
prompt: str,
reference_image_path: str = None,
num_inference_steps: int = 30,
guidance_scale: float = 7.5,
strength: float = 0.8
):
url = "http://localhost:8000/generate"
# Prepare parameters
params = {
"prompt": prompt,
"num_inference_steps": num_inference_steps,
"guidance_scale": guidance_scale,
"strength": strength
}
# Add reference image if provided
files = {}
if reference_image_path:
files = {"reference_image": open(reference_image_path, "rb")}
# Make request
response = requests.post(url, params=params, files=files)
if response.status_code == 200:
# Save the generated image
image = Image.open(io.BytesIO(response.content))
image.save("generated_character.png")
print("Image generated successfully!")
return image
else:
print(f"Error: {response.status_code}")
print(response.text)
return None
# Example usage
generate_character(
prompt="a beautiful elf warrior with golden armor",
reference_image_path="reference.png" # Optional
)Here's how to generate multiple images using previous generations as references:
def generate_character_sequence(
base_prompt: str,
variation_prompts: list[str],
strength: float = 0.8
):
# Generate initial character
first_image = generate_character(prompt=base_prompt)
if not first_image:
return
# Save first image
first_image.save("character_1.png")
# Generate variations using the first image as reference
for i, prompt in enumerate(variation_prompts, 2):
variation = generate_character(
prompt=prompt,
reference_image_path="character_1.png",
strength=strength
)
if variation:
variation.save(f"character_{i}.png")
# Example: Generate variations of a character
generate_character_sequence(
base_prompt="a warrior elf with golden armor",
variation_prompts=[
"same warrior elf but in battle pose",
"same warrior elf but with raised sword"
],
strength=0.8
)Here's a complete example of the character creation workflow using Python:
import requests
from PIL import Image
import time
from pathlib import Path
import json
class CharacterGenerator:
def __init__(self, base_url="http://localhost:8000"):
self.base_url = base_url
def create_character(self, prompt: str, existing_image_path: str = None) -> dict:
"""Create a new character"""
url = f"{self.base_url}/characters/create_initial"
if existing_image_path:
files = {
"existing_image": open(existing_image_path, "rb")
}
data = {"prompt": prompt}
response = requests.post(url, files=files, data=data)
else:
response = requests.post(url, json={"prompt": prompt})
response.raise_for_status()
return response.json()
def regenerate_base(self, character_id: str, prompt: str) -> dict:
"""Regenerate the base image"""
url = f"{self.base_url}/characters/create_initial"
data = {
"prompt": prompt,
"regenerate": True,
"character_id": character_id
}
response = requests.post(url, json=data)
response.raise_for_status()
return response.json()
def approve_base(self, character_id: str) -> dict:
"""Approve the current base image"""
url = f"{self.base_url}/characters/{character_id}/approve"
response = requests.post(url)
response.raise_for_status()
return response.json()
def generate_training_data(
self,
character_id: str,
num_variations: int = 10,
custom_prompts: list = None
) -> dict:
"""Generate training data variations"""
url = f"{self.base_url}/characters/{character_id}/generate_training"
data = {
"num_variations": num_variations,
"custom_prompts": custom_prompts
}
response = requests.post(url, json=data)
response.raise_for_status()
return response.json()
def start_training(self, character_id: str) -> dict:
"""Start LoRA training"""
url = f"{self.base_url}/characters/{character_id}/train"
response = requests.post(url)
response.raise_for_status()
return response.json()
def get_training_status(self, character_id: str) -> dict:
"""Get current training status"""
url = f"{self.base_url}/characters/{character_id}/training_status"
response = requests.get(url)
response.raise_for_status()
return response.json()
def wait_for_training(self, character_id: str, check_interval: int = 30) -> dict:
"""Wait for training to complete"""
while True:
status = self.get_training_status(character_id)
state = status["training_status"]["state"]
if state == "ready":
return status
elif state == "error":
raise Exception(f"Training failed: {status['training_status'].get('error')}")
print(f"Training progress: {status['training_status'].get('progress', 0):.1f}%")
time.sleep(check_interval)
def generate_scene(
self,
character_id: str,
prompt: str,
output_path: str,
num_inference_steps: int = 30,
guidance_scale: float = 7.5
) -> str:
"""Generate a new scene with the character"""
url = f"{self.base_url}/characters/{character_id}/generate_scene"
data = {
"prompt": prompt,
"num_inference_steps": num_inference_steps,
"guidance_scale": guidance_scale
}
response = requests.post(url, json=data)
response.raise_for_status()
# Save the image
with open(output_path, "wb") as f:
f.write(response.content)
return output_path
# Example usage
def create_character_workflow():
generator = CharacterGenerator()
# 1. Create initial character
character = generator.create_character(
prompt="a noble elven warrior with golden armor"
)
character_id = character["character"]["id"]
# 2. Regenerate until satisfied
while input("Satisfied with the base image? (y/n): ").lower() != 'y':
character = generator.regenerate_base(
character_id,
prompt="a noble elven warrior with golden armor"
)
# 3. Approve base image
generator.approve_base(character_id)
# 4. Generate training data
training_result = generator.generate_training_data(
character_id,
num_variations=10,
custom_prompts=[
"character in battle pose",
"character casting a spell",
"character riding a horse",
"character in formal attire",
"character in stealth mode"
]
)
# 5. Start training
generator.start_training(character_id)
# 6. Wait for training to complete
generator.wait_for_training(character_id)
# 7. Generate scenes
scenes = [
"the character exploring an ancient temple",
"the character in an epic battle",
"the character in a peaceful elven village"
]
for i, scene in enumerate(scenes):
generator.generate_scene(
character_id,
prompt=scene,
output_path=f"scene_{i+1}.png"
)
if __name__ == "__main__":
create_character_workflow()
### Advanced Training Features
#### Training Checkpoints
The training process automatically saves checkpoints that can be used to resume training if interrupted:
```bash
# Resume from latest checkpoint
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/train/resume"
# Resume from specific checkpoint
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/train/resume" \
-H "Content-Type: application/json" \
-d '{
"checkpoint_path": "/path/to/checkpoint.pt"
}'The training status endpoint provides detailed metrics:
{
"status": "success",
"training_status": {
"state": "training",
"progress": 45.5,
"training_loss": 0.234,
"epoch_loss": 0.245,
"current_epoch": 5,
"training_start": "2024-03-14T10:30:00",
"last_update": "2024-03-14T10:35:00"
}
}The training process includes automatic memory management:
- Mixed precision training (FP16)
- Gradient clipping
- Automatic OOM recovery
- GPU memory cleanup
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| prompt | string | Yes | - | Text description of the desired image |
| reference_image | file | No | None | Reference image for style guidance |
| num_inference_steps | int | No | 30 | Number of denoising steps |
| guidance_scale | float | No | 7.5 | How closely to follow the prompt |
| strength | float | No | 0.8 | How much to preserve from reference image |
Key Python packages required:
- torch >= 2.0.0
- diffusers >= 0.21.0
- transformers >= 4.31.0
- accelerate >= 0.21.0
- peft >= 0.5.0
- fastapi >= 0.100.0
- uvicorn >= 0.23.0
See requirements.txt for the complete list of dependencies.
character-generator/
βββ src/
β βββ api.py # FastAPI endpoints
β βββ model_handler.py # Stable Diffusion handler
β βββ config.py # Configuration classes
β βββ character_manager.py # Character and LoRA management
β βββ main.py # Application entry point
βββ storage/
β βββ base_characters/ # Character base images
β βββ lora_models/ # Trained LoRA weights
β βββ outputs/ # Generated images
βββ models/ # AI model storage
βββ scripts/ # Utility scripts
βββ docker-compose.yml # Docker configuration
βββ Dockerfile # Container definition
βββ requirements.txt # Python dependencies
- Create a Python virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Run the service:
python -m src.main- The service requires a CUDA-capable NVIDIA GPU
- First request might be slower due to model loading
- Make sure to have enough disk space for the AI models (~20GB)
- The API will return a PNG image file directly in the response
- All errors will return appropriate HTTP status codes with error messages
Key dependencies (see requirements.txt for full list):
- torch >= 2.0.0
- diffusers >= 0.21.0
- transformers >= 4.31.0
- fastapi >= 0.100.0
- Pillow >= 10.0.0
- python-multipart >= 0.0.6