Lightweight alt text generation service using Ollama - no Python ML dependencies required.
A Flask-based REST API that generates accessible alt text descriptions for images using vision-language models. The service uses Ollama for model inference, eliminating the need for heavy Python ML dependencies like PyTorch or Transformers.
- 🚀 Lightweight: Minimal dependencies (just Flask and Requests)
- 🔒 Secure: Built-in SSRF protection and input validation
- 🖼️ Flexible: Supports JPEG, PNG, GIF, and WebP images
- ⚡ Fast: Efficient streaming image downloads with size limits
- 🎯 Customizable: Optional custom prompts for specialized descriptions
- 🏥 Health Checks: Built-in health monitoring endpoints (
/healthand/up)
# Clone the repository
git clone https://github.com/fairdataihub/alt-text-generator.git
cd alt-text-generator
# Install Ollama and pull the vision model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-vl:4b
# Install Python dependencies and start the server
pip install -r requirements.txt
python server.py| Aspect | Transformers Version | Ollama Version |
|---|---|---|
| Python deps | PyTorch, Transformers, etc (~5-10GB) | Flask, Requests (~1MB) |
| Model management | Manual HF cache | ollama pull/list/rm |
| Setup complexity | Virtual env, CUDA, etc | Single binary + one command |
| Portability | Python 3.12+, CUDA | Any system Ollama supports |
| Property | Value |
|---|---|
| Model | qwen3-vl:4b |
| Parameters | 4B |
| Size | 3.3 GB |
| Runtime | Ollama |
curl -fsSL https://ollama.com/install.sh | shollama pull qwen3-vl:4bollama servepip install -r requirements.txt
# That's it! No PyTorch, no Transformers, no CUDA toolkitOr install directly:
pip install flask requestspython server.pyServer runs on http://localhost:5000 by default.
Returns API information including version, model name, and available endpoints.
Response:
{
"name": "Alt Text Generator API",
"version": "1.0.0",
"model": "qwen3-vl:4b",
"endpoints": {
"up": "/up",
"health": "/health",
"generate": "/generate"
}
}Example:
curl http://localhost:5000/Uptime check endpoint that verifies the model is available and loaded. Returns 200 if the service is ready, 503 if not. Useful for load balancer health checks.
Response:
Success (200):
{
"status": "up",
"model": "qwen3-vl:4b",
"model_available": true,
"model_loaded": true
}Service Unavailable (503):
{
"status": "down",
"reason": "Model not available" | "Model not loaded" | "Ollama unreachable",
"model": "qwen3-vl:4b",
"model_available": false,
"model_loaded": false
}Example:
curl http://localhost:5000/upHealth check endpoint that verifies Ollama is running and the required model is available. Provides detailed status information including all available models.
Response:
{
"status": "healthy",
"ollama_reachable": true,
"model": "qwen3-vl:4b",
"model_available": true,
"available_models": ["qwen3-vl:4b", ...]
}Example:
curl http://localhost:5000/healthGenerate alt text for an image. Returns plain text response.
Query Parameters:
imageUrl(required): URL of the image to caption. Supports bothimageUrlandimage_urlparameter names.prompt(optional): Custom prompt for the model. Default: "Describe this image in one concise sentence for alt text."
Response:
- Success (200): Plain text alt text description
- Error (400): Validation error message (invalid URL, unsupported format, etc.)
- Error (500): Internal server error message
- Error (503): Ollama service unavailable
Examples:
Basic usage:
curl "http://localhost:5000/generate?imageUrl=https://fairdataihub.org/images/blog/ismb-2025/dorian-team.jpeg"With custom prompt:
curl "http://localhost:5000/generate?imageUrl=https://example.com/image.jpg&prompt=Describe%20this%20image%20for%20a%20visually%20impaired%20user."Using snake_case parameter:
curl "http://localhost:5000/generate?image_url=https://example.com/image.jpg"| Variable | Default | Description |
|---|---|---|
PORT |
5000 | Server port |
OLLAMA_HOST |
http://localhost:11434 | Ollama API URL |
OLLAMA_MODEL |
qwen3-vl:4b | Vision model to use for generation |
The service enforces the following limits for security and performance:
- Maximum image size: 10 MiB
- Supported formats: JPEG, PNG, GIF, WebP
- Maximum redirects: 5 (prevents redirect loops)
- Request timeout: 30 seconds for image download, 120 seconds for generation
The service includes several security measures:
- SSRF Protection: Blocks access to private/internal IP ranges (localhost, 127.0.0.1, 10.x.x.x, 192.168.x.x, etc.)
- URL Validation: Only allows HTTP/HTTPS URLs
- Content-Type Validation: Only accepts known image MIME types
- Size Limits: Prevents memory exhaustion with configurable size limits
- DNS Resolution Checks: Validates resolved IP addresses to prevent DNS rebinding attacks
| Metric | Transformers (R-4B) | Ollama (qwen3-vl:4b) |
|---|---|---|
| MMStar Score | 72.6 | TBD (likely lower) |
| Parameters | 4.82B | 4B |
| Python deps | ~5-10GB | ~1MB |
| Setup time | 10-15 min | 2 min |
| Model download | ~10GB (HF) | 3.3GB (Ollama) |
-
Image Fetching: The service validates the provided URL, checks for SSRF vulnerabilities, and downloads the image with streaming to enforce size limits.
-
Image Processing: The downloaded image is converted to base64 format for transmission to Ollama.
-
Caption Generation: The base64 image and prompt are sent to Ollama's API, which uses the vision-language model to generate descriptive text.
-
Response: The generated alt text is returned as plain text to the client.
Make sure Ollama is running:
ollama serveCheck if Ollama is accessible:
curl http://localhost:11434/api/tagsPull the model:
ollama pull qwen3-vl:4bollama listThis error occurs when trying to access private/internal IP addresses. The service blocks these for security reasons. Use publicly accessible image URLs only.
- Ensure the image is under 10 MiB
- Verify the image format is JPEG, PNG, GIF, or WebP
- Check that the server returns the correct Content-Type header
- Verify Ollama is running and accessible
- Check that the model specified in
OLLAMA_MODELis available - Review the
available_modelslist in the health check response
The project includes a docker-compose.yml file for easy deployment with GPU support.
Prerequisites:
- Docker and Docker Compose installed
- Ollama running in an external network named
ollama-network - GPU access (if using GPU-accelerated inference)
Deploy:
# Set environment variables (optional)
export APP_PORT=23711
export RESTART_POLICY=unless-stopped
# Start the service
docker compose up -d --buildHealth Check:
The Docker Compose configuration includes a health check that uses the /up endpoint:
# Check container health status
docker compose ps
# Manually test the /up endpoint
curl http://localhost:23711/upConfiguration:
- The service connects to Ollama via the
ollama-networkDocker network - GPU device 0 is used by default (configured in
docker-compose.yml) - Port mapping: Host port (default: 23711) → Container port 5000
- Health check runs every 20 seconds with a 120-second startup grace period
Build and run the container manually:
# Build the image
docker build -t alt-text-generator .
# Run the container
docker run -p 5000:5000 \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
alt-text-generatorThe project includes linting configuration:
.flake8- Flake8 configuration for PEP 8 style checking.pylint.ini- Pylint configuration for code analysis
alt-text-generator/
├── server.py # Main Flask application
├── requirements.txt # Python dependencies
├── Dockerfile # Docker image configuration
├── docker-compose.yml # Docker Compose configuration
├── README.md # This file
└── LICENSE # MIT License
This project is licensed under the MIT License - see the LICENSE file for more information.