A machine learning solution for the Kaggle "Text-to-SVG Generation" competition that converts text descriptions into high-quality SVG images.
Text-to-SVG Genie is an AI system that generates Scalable Vector Graphics (SVG) code from text descriptions. Given a text prompt describing an image, the model generates SVG code that renders the described scene as accurately as possible.
This project was developed for the Kaggle competition aimed at building specialized solutions that outperform general-purpose LLMs in generating image-rendering code, providing greater transparency in the process.
- Transform text descriptions into SVG code
- Ensure compliance with competition constraints
- Generate aesthetically pleasing vector images
- Optimize for both visual fidelity and description faithfulness
Clone the repository and install the required dependencies:
git clone https://github.com/Harsh-BH/text-to-svg-genie.git
cd text-to-svg-genie
pip install -r requirements.txtThe project relies on the following key libraries:
- kagglehub - For Kaggle package integration
- numpy/pandas - For data handling
- scipy/scikit-learn - For computational tasks
- PyTorch - For deep learning models
- Matplotlib - For visualization
Our model adheres to the following competition requirements:
- Generated SVGs are less than 10,000 bytes
- Only allowlisted SVG elements and attributes are used
- No rasterized image data or external sources
- No CSS style elements
- SVG generation completes within 5 minutes per prompt
The model is optimized for the SVG Image Fidelity Score, which combines:
- VQA task results using PaliGemma model
- OCR text detection (with penalties for excess text)
- CLIP-based Aesthetic Score
- Final score: harmonic mean of VQA and Aesthetic scores
# Import the model
from text_to_svg_genie.model import Model
# Initialize the model
model = Model()
# Generate SVG from a text description
text_prompt = "A red apple sitting on a wooden table"
svg_code = model.predict(text_prompt)
# Save the SVG to a file
with open("apple.svg", "w") as f:
f.write(svg_code)Run the demo script to test the model with sample prompts:
python demo.pyOur approach combines:
- Text Understanding: Extracting key visual elements from descriptions
- Scene Composition: Determining optimal layout and element relationships
- SVG Generation: Creating vector elements that best represent the described scene
- Constraint Optimization: Ensuring outputs meet competition requirements
text-to-svg-genie/
βββ README.md # Project documentation
βββ requirements.txt # Dependencies
βββ .gitignore # Git ignore file
βββ text_to_svg_genie/ # Main package directory
β βββ __init__.py # Package initialization
β βββ model.py # Model implementation
β βββ svg_generator.py # SVG generation utilities
β βββ utils.py # Helper functions
βββ examples/ # Example SVGs and outputs
βββ demo.py # Demonstration script
βββ tests/ # Test suite
Our model aims to:
- Generate accurate representations of described scenes
- Create aesthetically pleasing vector graphics
- Optimize for the competition evaluation metrics
- Complete generation within required time constraints