Skip to content

Wrenbjor/creativity

Repository files navigation

Character Generator API

A FastAPI-based service that generates character images using Stable Diffusion XL. This service provides a simple API endpoint for generating images based on text prompts, with optional reference images for style guidance and LoRA model support for character customization.

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with CUDA support
  • At least 16GB of GPU memory recommended
  • At least 20GB of disk space for models

Installation

  1. Clone the repository:
git clone [your-repo-url]
cd character-generator
  1. Run the setup script:
chmod +x setup.sh
./setup.sh

This will:

  • Install required Python packages
  • Download necessary AI models
  • Build and start the Docker container

Environment Variables

The project uses environment variables for configuration. A .env.example file is provided as a template:

  1. Copy the example file to create your own .env file:
cp .env.example .env
  1. Edit the .env file to set your own values:
nano .env
  1. Replace the placeholder values with your actual configuration:
# Database Configuration
MYSQL_ROOT_PASSWORD=your_secure_root_password
MYSQL_PASSWORD=your_secure_user_password

# Hugging Face Token
HUGGINGFACE_TOKEN=your_huggingface_token

# MinIO Configuration
MINIO_ROOT_PASSWORD=your_minio_password

# NCA Toolkit Configuration
API_KEY=your_api_key
S3_ACCESS_KEY=your_s3_access_key
S3_SECRET_KEY=your_s3_secret_key

The .env file is already in the .gitignore to prevent accidental commits of sensitive information.

πŸ” Security and Environment Variables

This project uses environment variables to manage sensitive information like API keys and tokens. To ensure your sensitive data is not accidentally committed to version control:

  1. Use the .env file for sensitive information:

    • The project includes a .env file for storing sensitive information
    • This file is already in the .gitignore to prevent accidental commits
    • Add your sensitive tokens and keys to this file
  2. Required sensitive variables:

    • HUGGINGFACE_TOKEN: Your Hugging Face API token for model downloads
    • MYSQL_ROOT_PASSWORD: Database root password
    • MYSQL_PASSWORD: Database user password
  3. GitHub Secret Scanning:

    • GitHub's secret scanning will detect and block pushes containing sensitive information
    • If you need to push code with example tokens (for documentation), use placeholder values like YOUR_TOKEN_HERE

πŸ”Œ Service Ports

The following services are exposed on these ports:

Service Port Description
Character Generator 2035 Main API for character generation
n8n 2678 Workflow automation platform
MinIO 2000, 2001 Object storage (API, Console)
Kokoro TTS 2880 Text-to-speech service
MariaDB 2306 Database service
NCA Toolkit 2080 No-code toolkit interface
Ollama 2030 LLM service
Weaviate 2500 Vector database

πŸ“š API Usage

Health Check

The API provides two health check endpoints:

  1. Basic health check:
curl http://localhost:8000/

Expected response:

{
    "status": "ok",
    "message": "Character Generator API is running. Model status: initialized"
}
  1. Detailed health check:
curl http://localhost:8000/health

Expected response:

{
    "status": "healthy"
}

Character Creation Workflow

The API follows a specific workflow for creating and training characters:

  1. Initial Character Creation

    • Create from description:
    curl -X POST "http://localhost:8000/characters/create_initial" \
         -H "Content-Type: application/json" \
         -d '{"prompt": "a noble elven warrior with golden armor"}'
    • Create from existing image:
    curl -X POST "http://localhost:8000/characters/create_initial" \
         -F "prompt=a noble elven warrior with golden armor" \
         -F "[email protected]"
    • Regenerate if not satisfied:
    curl -X POST "http://localhost:8000/characters/create_initial" \
         -H "Content-Type: application/json" \
         -d '{
             "prompt": "a noble elven warrior with golden armor",
             "regenerate": true,
             "character_id": "YOUR_CHARACTER_ID"
         }'
  2. Approve Base Image

    curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/approve"
  3. Generate Training Data

    • Generate default variations:
    curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_training" \
         -H "Content-Type: application/json" \
         -d '{"num_variations": 10}'
    • Generate with custom prompts:
    curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_training" \
         -H "Content-Type: application/json" \
         -d '{
             "num_variations": 5,
             "custom_prompts": [
                 "character in battle pose",
                 "character casting a spell",
                 "character riding a horse",
                 "character in formal attire",
                 "character in stealth mode"
             ]
         }'
  4. Manage Training Images

    • View a training image:
    curl "http://localhost:8000/characters/YOUR_CHARACTER_ID/training/0" --output training_0.png
    • Regenerate a specific training image:
    curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/training/0/regenerate" \
         -H "Content-Type: application/json" \
         -d '{"custom_prompt": "character in a different battle pose"}'
    • Remove unwanted training image:
    curl -X DELETE "http://localhost:8000/characters/YOUR_CHARACTER_ID/training/0"
  5. Train LoRA Model Once you have a satisfactory set of training images:

    # Start training
    curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/train"

    Check training status:

    curl "http://localhost:8000/characters/YOUR_CHARACTER_ID/training_status"

    Example response:

    {
        "status": "success",
        "training_status": {
            "state": "training",
            "progress": 45.5,
            "training_start": "2024-03-14T10:30:00",
            "last_update": "2024-03-14T10:35:00"
        }
    }
  6. Generate New Scenes Once training is complete (state is "ready"), generate new scenes with your character:

    curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
         -H "Content-Type: application/json" \
         -d '{
             "prompt": "the character exploring an ancient temple",
             "num_inference_steps": 30,
             "guidance_scale": 7.5
         }' \
         --output scene.png

Character States

Characters progress through several states during creation and training:

  1. initial - Just created, no approved base image
  2. base_approved - Has approved base image, ready for training data generation
  3. generating_training - Currently generating training data variations
  4. training - LoRA model training in progress
  5. ready - LoRA trained and ready for scene generation
  6. error - Something went wrong (check error message in status)

Training Configuration

The LoRA training process can be configured through environment variables:

# LoRA Training Configuration
LORA_RANK=16           # Rank of LoRA matrices
LORA_ALPHA=32          # LoRA scaling factor
NUM_TRAIN_EPOCHS=100   # Number of training epochs

The training process requires:

  • At least 5 training images
  • CUDA-capable GPU with sufficient memory
  • Training time varies based on number of images and epochs

Scene Generation Tips

When generating new scenes with a trained character:

  1. Always include distinctive features from the original character description
  2. Be specific about the scene and character's pose/action
  3. Use the same style keywords as in the original description for consistency

Example prompts:

# Action scene
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
     -d '{"prompt": "the noble elven warrior with golden armor in an epic battle stance, wielding a glowing sword, dramatic lighting"}'

# Portrait scene
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
     -d '{"prompt": "close up portrait of the noble elven warrior with golden armor, serene expression, detailed face features"}'

# Environmental scene
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/generate_scene" \
     -d '{"prompt": "the noble elven warrior with golden armor standing in a mystical elven forest, ethereal atmosphere"}'

Character Management

The API supports character management using LoRA (Low-Rank Adaptation) models for consistent character generation:

Create Character

Create a new character profile with associated LoRA model:

curl -X POST "http://localhost:8000/characters/create" \
     -H "Content-Type: application/json" \
     -d '{
         "name": "elf_warrior",
         "description": "A noble elven warrior with golden armor",
         "training_images": ["base_image1.png", "base_image2.png"],
         "lora_config": {
             "r": 16,
             "alpha": 32,
             "target_modules": ["q_proj", "v_proj"]
         }
     }'

Generate with Character

Generate an image using a specific character's LoRA model:

curl -X POST "http://localhost:8000/characters/generate" \
     -H "Content-Type: application/json" \
     -d '{
         "character_name": "elf_warrior",
         "prompt": "the character in a battle pose",
         "num_inference_steps": 30,
         "guidance_scale": 7.5
     }' \
     --output character.png

Basic Image Generation

Basic Generation

Generate an image using just a prompt:

curl -X POST "http://localhost:8000/generate?prompt=a%20beautiful%20fantasy%20character" \
     --output character.png

Advanced Generation

Generate with all parameters and a reference image:

curl -X POST "http://localhost:8000/generate" \
     -F "prompt=a beautiful fantasy character with long flowing hair" \
     -F "[email protected]" \
     -F "num_inference_steps=30" \
     -F "guidance_scale=7.5" \
     -F "strength=0.8" \
     --output character.png

Using Generated Images as References

The API doesn't store generated images - they are saved locally where you make the API call. To use a generated image as a reference:

  1. First generate and save an image:
# Generate first image
curl -X POST "http://localhost:8000/generate?prompt=elf warrior" \
     --output first_character.png
  1. Then use that saved image as a reference for a new generation:
# Use first_character.png as reference for new generation
curl -X POST "http://localhost:8000/generate" \
     -F "prompt=elf warrior with different pose" \
     -F "reference_image=@first_character.png" \
     -F "strength=0.8" \
     --output second_character.png

The strength parameter controls how much influence the reference image has on the final result:

  • Higher values (closer to 1.0) preserve more of the reference image's style and composition
  • Lower values (closer to 0.0) allow more deviation from the reference

Python Example

import requests
from PIL import Image
import io

def generate_character(
    prompt: str,
    reference_image_path: str = None,
    num_inference_steps: int = 30,
    guidance_scale: float = 7.5,
    strength: float = 0.8
):
    url = "http://localhost:8000/generate"
    
    # Prepare parameters
    params = {
        "prompt": prompt,
        "num_inference_steps": num_inference_steps,
        "guidance_scale": guidance_scale,
        "strength": strength
    }
    
    # Add reference image if provided
    files = {}
    if reference_image_path:
        files = {"reference_image": open(reference_image_path, "rb")}
    
    # Make request
    response = requests.post(url, params=params, files=files)
    
    if response.status_code == 200:
        # Save the generated image
        image = Image.open(io.BytesIO(response.content))
        image.save("generated_character.png")
        print("Image generated successfully!")
        return image
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Example usage
generate_character(
    prompt="a beautiful elf warrior with golden armor",
    reference_image_path="reference.png"  # Optional
)

Using Generated Images as References in Python

Here's how to generate multiple images using previous generations as references:

def generate_character_sequence(
    base_prompt: str,
    variation_prompts: list[str],
    strength: float = 0.8
):
    # Generate initial character
    first_image = generate_character(prompt=base_prompt)
    if not first_image:
        return
    
    # Save first image
    first_image.save("character_1.png")
    
    # Generate variations using the first image as reference
    for i, prompt in enumerate(variation_prompts, 2):
        variation = generate_character(
            prompt=prompt,
            reference_image_path="character_1.png",
            strength=strength
        )
        if variation:
            variation.save(f"character_{i}.png")

# Example: Generate variations of a character
generate_character_sequence(
    base_prompt="a warrior elf with golden armor",
    variation_prompts=[
        "same warrior elf but in battle pose",
        "same warrior elf but with raised sword"
    ],
    strength=0.8
)

πŸ“š Complete Python Example

Here's a complete example of the character creation workflow using Python:

import requests
from PIL import Image
import time
from pathlib import Path
import json

class CharacterGenerator:
    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url

    def create_character(self, prompt: str, existing_image_path: str = None) -> dict:
        """Create a new character"""
        url = f"{self.base_url}/characters/create_initial"
        
        if existing_image_path:
            files = {
                "existing_image": open(existing_image_path, "rb")
            }
            data = {"prompt": prompt}
            response = requests.post(url, files=files, data=data)
        else:
            response = requests.post(url, json={"prompt": prompt})
            
        response.raise_for_status()
        return response.json()

    def regenerate_base(self, character_id: str, prompt: str) -> dict:
        """Regenerate the base image"""
        url = f"{self.base_url}/characters/create_initial"
        data = {
            "prompt": prompt,
            "regenerate": True,
            "character_id": character_id
        }
        response = requests.post(url, json=data)
        response.raise_for_status()
        return response.json()

    def approve_base(self, character_id: str) -> dict:
        """Approve the current base image"""
        url = f"{self.base_url}/characters/{character_id}/approve"
        response = requests.post(url)
        response.raise_for_status()
        return response.json()

    def generate_training_data(
        self,
        character_id: str,
        num_variations: int = 10,
        custom_prompts: list = None
    ) -> dict:
        """Generate training data variations"""
        url = f"{self.base_url}/characters/{character_id}/generate_training"
        data = {
            "num_variations": num_variations,
            "custom_prompts": custom_prompts
        }
        response = requests.post(url, json=data)
        response.raise_for_status()
        return response.json()

    def start_training(self, character_id: str) -> dict:
        """Start LoRA training"""
        url = f"{self.base_url}/characters/{character_id}/train"
        response = requests.post(url)
        response.raise_for_status()
        return response.json()

    def get_training_status(self, character_id: str) -> dict:
        """Get current training status"""
        url = f"{self.base_url}/characters/{character_id}/training_status"
        response = requests.get(url)
        response.raise_for_status()
        return response.json()

    def wait_for_training(self, character_id: str, check_interval: int = 30) -> dict:
        """Wait for training to complete"""
        while True:
            status = self.get_training_status(character_id)
            state = status["training_status"]["state"]
            
            if state == "ready":
                return status
            elif state == "error":
                raise Exception(f"Training failed: {status['training_status'].get('error')}")
                
            print(f"Training progress: {status['training_status'].get('progress', 0):.1f}%")
            time.sleep(check_interval)

    def generate_scene(
        self,
        character_id: str,
        prompt: str,
        output_path: str,
        num_inference_steps: int = 30,
        guidance_scale: float = 7.5
    ) -> str:
        """Generate a new scene with the character"""
        url = f"{self.base_url}/characters/{character_id}/generate_scene"
        data = {
            "prompt": prompt,
            "num_inference_steps": num_inference_steps,
            "guidance_scale": guidance_scale
        }
        response = requests.post(url, json=data)
        response.raise_for_status()
        
        # Save the image
        with open(output_path, "wb") as f:
            f.write(response.content)
            
        return output_path

# Example usage
def create_character_workflow():
    generator = CharacterGenerator()
    
    # 1. Create initial character
    character = generator.create_character(
        prompt="a noble elven warrior with golden armor"
    )
    character_id = character["character"]["id"]
    
    # 2. Regenerate until satisfied
    while input("Satisfied with the base image? (y/n): ").lower() != 'y':
        character = generator.regenerate_base(
            character_id,
            prompt="a noble elven warrior with golden armor"
        )
    
    # 3. Approve base image
    generator.approve_base(character_id)
    
    # 4. Generate training data
    training_result = generator.generate_training_data(
        character_id,
        num_variations=10,
        custom_prompts=[
            "character in battle pose",
            "character casting a spell",
            "character riding a horse",
            "character in formal attire",
            "character in stealth mode"
        ]
    )
    
    # 5. Start training
    generator.start_training(character_id)
    
    # 6. Wait for training to complete
    generator.wait_for_training(character_id)
    
    # 7. Generate scenes
    scenes = [
        "the character exploring an ancient temple",
        "the character in an epic battle",
        "the character in a peaceful elven village"
    ]
    
    for i, scene in enumerate(scenes):
        generator.generate_scene(
            character_id,
            prompt=scene,
            output_path=f"scene_{i+1}.png"
        )

if __name__ == "__main__":
    create_character_workflow()

### Advanced Training Features

#### Training Checkpoints

The training process automatically saves checkpoints that can be used to resume training if interrupted:

```bash
# Resume from latest checkpoint
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/train/resume"

# Resume from specific checkpoint
curl -X POST "http://localhost:8000/characters/YOUR_CHARACTER_ID/train/resume" \
     -H "Content-Type: application/json" \
     -d '{
         "checkpoint_path": "/path/to/checkpoint.pt"
     }'

Training Metrics

The training status endpoint provides detailed metrics:

{
    "status": "success",
    "training_status": {
        "state": "training",
        "progress": 45.5,
        "training_loss": 0.234,
        "epoch_loss": 0.245,
        "current_epoch": 5,
        "training_start": "2024-03-14T10:30:00",
        "last_update": "2024-03-14T10:35:00"
    }
}

Memory Management

The training process includes automatic memory management:

  • Mixed precision training (FP16)
  • Gradient clipping
  • Automatic OOM recovery
  • GPU memory cleanup

πŸ“‹ API Parameters

Parameter Type Required Default Description
prompt string Yes - Text description of the desired image
reference_image file No None Reference image for style guidance
num_inference_steps int No 30 Number of denoising steps
guidance_scale float No 7.5 How closely to follow the prompt
strength float No 0.8 How much to preserve from reference image

πŸ›  Development

Dependencies

Key Python packages required:

  • torch >= 2.0.0
  • diffusers >= 0.21.0
  • transformers >= 4.31.0
  • accelerate >= 0.21.0
  • peft >= 0.5.0
  • fastapi >= 0.100.0
  • uvicorn >= 0.23.0

See requirements.txt for the complete list of dependencies.

Project Structure

character-generator/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api.py            # FastAPI endpoints
β”‚   β”œβ”€β”€ model_handler.py  # Stable Diffusion handler
β”‚   β”œβ”€β”€ config.py         # Configuration classes
β”‚   β”œβ”€β”€ character_manager.py # Character and LoRA management
β”‚   └── main.py          # Application entry point
β”œβ”€β”€ storage/
β”‚   β”œβ”€β”€ base_characters/ # Character base images
β”‚   β”œβ”€β”€ lora_models/    # Trained LoRA weights
β”‚   └── outputs/        # Generated images
β”œβ”€β”€ models/             # AI model storage
β”œβ”€β”€ scripts/           # Utility scripts
β”œβ”€β”€ docker-compose.yml # Docker configuration
β”œβ”€β”€ Dockerfile        # Container definition
└── requirements.txt  # Python dependencies

Running in Development Mode

  1. Create a Python virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the service:
python -m src.main

⚠️ Notes

  • The service requires a CUDA-capable NVIDIA GPU
  • First request might be slower due to model loading
  • Make sure to have enough disk space for the AI models (~20GB)
  • The API will return a PNG image file directly in the response
  • All errors will return appropriate HTTP status codes with error messages

πŸ“ Dependencies

Key dependencies (see requirements.txt for full list):

  • torch >= 2.0.0
  • diffusers >= 0.21.0
  • transformers >= 4.31.0
  • fastapi >= 0.100.0
  • Pillow >= 10.0.0
  • python-multipart >= 0.0.6

About

AI/ML project from sample collection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published