Icons AI System

A comprehensive system for processing, categorizing, translating, and creating embeddings for icon datasets with multi-language support.

System Overview

This system takes a collection of icons with English tags and processes them through a pipeline that:

Strips and cleans the icon data
Enhances English tags with synonyms
Categorizes icons using AI to identify daily-life relevant icons
Translates tags to multiple languages using AI (only for relevant icons)
Creates language-specific icon files (only relevant icons)
Generates embeddings for search and similarity (only relevant icons)

Architecture

The system is built around a project-based configuration approach where all settings are defined in YAML files. Each script can be run independently or as part of a complete workflow.

Core Components

Configuration System: Centralized settings in projects/v1.yaml
Library Modules: Reusable components in library/
Processing Scripts: Specialized tools for each workflow step
Data Pipeline: Sequential processing from raw icons to final outputs

Workflow

1. Prepare Models

python prepare_models.py

Downloads and prepares FastText language models for the configured languages. See projects/v1.yaml for model settings.

2. Strip English Icons

python strip_english_icons.py

Creates a clean, stripped version of the English icons containing only essential fields (name and tags). This becomes the base file for all subsequent processing.

3. Generate Synonyms (Optional)

python generate_synonims.py

Enhances the English stripped icons by adding synonyms to each tag using NLTK WordNet. This increases the vocabulary for better translation coverage.

4. Categorize Icons

python categorize_icons.py

NEW: Uses AI to categorize icons into daily-life relevant categories. Only icons that match daily-life activities are marked as "enabled". This step creates:

data/icons.enabled.en.json - Only relevant icons
data/categories.json - Category assignments for each icon

Categories: See the 15 predefined categories in categorize_icons.py (Family, Work, Health, etc.)

5. Translate Keywords

python translate_keywords.py --languages ru,fr

Translates unique tags from enabled icons only into target languages using OpenAI. Creates translation dictionaries (data/keywords.{language}.json) for each language.

6. Translate Icons

python translate_icons.py --languages ru,fr

Uses the translation dictionaries to create language-specific icon files. Only processes enabled icons - creates smaller, focused files with translated tags.

7. Create Embeddings

python create_icons_embeddings.py --languages en,ru,fr

Generates embeddings for enabled icons only in each language. Creates searchable vector representations for the most relevant icons.

Configuration

All system behavior is controlled through the project configuration file. See projects/v1.yaml for:

Models: FastText model settings and dimensions
Icons: File paths and naming patterns
Translation: OpenAI settings and parameters
Languages: Supported languages and their specific settings

Environment Variables

Set these in your .env file:

DEFAULT_PROJECT: Path to default project YAML file
OPENAI_API_KEY: Your OpenAI API key for translations

File Structure

data/
├── icons.json                    # Original icon dataset
├── icons.stripped.en.json        # Stripped English icons (all icons)
├── icons.enabled.en.json         # Only relevant icons (categorized)
├── icons.stripped.{lang}.json    # Translated icon files (enabled only)
├── keywords.{lang}.json          # Translation dictionaries
└── categories.json              # Icon category assignments

embeddings/
├── icon_embeddings.{lang}.json   # Icon vector embeddings (enabled only)
└── vocab_embeddings.{lang}.json  # Vocabulary embeddings

.models/
└── v1/                          # FastText language models

Usage Patterns

Complete Workflow

Run all steps in sequence for a full processing pipeline:

python prepare_models.py
python strip_english_icons.py
python generate_synonims.py
python categorize_icons.py                    # NEW: Categorize icons
python translate_keywords.py --languages ru,fr
python translate_icons.py --languages ru,fr
python create_icons_embeddings.py --languages en,ru,fr

Or use the automated script:

./run-all.sh

Single Language Processing

Process only specific languages by using the --languages parameter:

python translate_keywords.py --languages ru
python translate_icons.py --languages ru

Custom Configuration

Use different project settings:

python translate_keywords.py --project projects/custom.yaml --languages ru

Categorization System

The system uses AI-powered categorization to identify icons relevant to daily life activities. Icons are classified into 15 predefined categories:

Categories (see categorize_icons.py for full list):

Family and Relationships
Friends and Social
Work and Career
Health and Fitness
Finance and Money
Education and Learning
Shopping and Errands
Travel and Transportation
Home and Chores
Food and Cooking
Hobbies and Entertainment
Pets and Animals
Events and Calendar
Personal Care
Technology and Gadgets

Configuration: Categorization threshold and model can be adjusted in projects/v1.yaml under models.defaults.categorization.

Output:

data/icons.enabled.en.json - Only relevant icons
data/categories.json - Category assignments for each icon

Key Features

Smart Categorization: AI-powered filtering to focus on relevant icons
Project-Based Configuration: All settings centralized in YAML files
Language Support: Process multiple languages with language-specific settings
AI-Powered Translation: Uses OpenAI for high-quality tag translation
Embedding Generation: Creates searchable vector representations
Modular Design: Each script can be run independently
Error Handling: Graceful handling of missing files and API failures
Progress Tracking: Visual progress bars for long-running operations

Dependencies

Install required packages:

pip install -r requirements.txt

Key dependencies include:

FastText for language models and embeddings
OpenAI for AI-powered translations
NLTK for English synonym generation
Typer for command-line interfaces

Workflow Benefits

The categorization-first approach provides several advantages:

🎯 Quality Focus: Only relevant icons get expensive translation and embedding processing
💰 Cost Efficiency: Reduces translation API calls by ~57% (typically 1400 vs 3300 icons)
⚡ Performance: Faster embedding generation with fewer icons
🌍 Clean Datasets: Language files contain only relevant, translated icons
📊 Smart Filtering: AI determines relevance based on daily-life categories

Output Files

The system generates several types of output files:

Stripped Icons: Clean, minimal icon data (all icons)
Enabled Icons: Only relevant icons after categorization
Translation Files: Keyword mappings for each language (enabled icons only)
Translated Icons: Language-specific icon collections (enabled icons only)
Embeddings: Vector representations for search and similarity (enabled icons only)
Categories: Icon-to-category assignments for analysis

Each file type follows consistent naming patterns with language placeholders, making it easy to work with multiple languages programmatically.

Getting Started

Configure your project settings in projects/v1.yaml
Set up your .env file with required API keys
Run the workflow steps in sequence
Use the generated files for your application needs

The system is designed to be flexible and extensible, allowing you to customize the processing pipeline based on your specific requirements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Icons AI System

System Overview

Architecture

Core Components

Workflow

1. Prepare Models

2. Strip English Icons

3. Generate Synonyms (Optional)

4. Categorize Icons

5. Translate Keywords

6. Translate Icons

7. Create Embeddings

Configuration

Environment Variables

File Structure

Usage Patterns

Complete Workflow

Single Language Processing

Custom Configuration

Categorization System

Key Features

Dependencies

Workflow Benefits

Output Files

Getting Started

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
archive		archive
data		data
embeddings		embeddings
library		library
projects		projects
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
categorize_icons.py		categorize_icons.py
create_icons_embeddings.py		create_icons_embeddings.py
generate_synonims.py		generate_synonims.py
mypy.ini		mypy.ini
prepare_models.py		prepare_models.py
requirements.sh		requirements.sh
requirements.txt		requirements.txt
run-all.sh		run-all.sh
search_icons.py		search_icons.py
search_icons_v2.py		search_icons_v2.py
strip_english_icons.py		strip_english_icons.py
translate_icons.py		translate_icons.py
translate_keywords.py		translate_keywords.py

smarter-day/icons-ai

Folders and files

Latest commit

History

Repository files navigation

Icons AI System

System Overview

Architecture

Core Components

Workflow

1. Prepare Models

2. Strip English Icons

3. Generate Synonyms (Optional)

4. Categorize Icons

5. Translate Keywords

6. Translate Icons

7. Create Embeddings

Configuration

Environment Variables

File Structure

Usage Patterns

Complete Workflow

Single Language Processing

Custom Configuration

Categorization System

Key Features

Dependencies

Workflow Benefits

Output Files

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages