Keras-TensorFlow Product Classifier is an automatic product classification system based on product names using modern machine learning methods.
- Automatically determines product category from name
- Works with different languages
- Uses modern transformers for understanding meaning
- Achieves high accuracy on properly formatted data
- Processes multiple product categories automatically
- Sentence Transformers - for creating name embeddings
- Neural Network - for category classification
- TensorFlow/Keras - modern ML framework
- Scikit-learn - for metrics and data preprocessing
- Python 3.8+
- 4GB+ RAM
- TensorFlow 2.20.0+
# Clone repository
git clone <repository-url>
cd ai-product-classifier
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThis project does not include training data due to commercial confidentiality. Users must provide their own product data.
Your data must be in CSV format with the following columns:
name,fullGroupName
"iPhone 15 Pro Max","Accessories/Phone Cases/Apple iPhone 15 Pro Max"
"Samsung Galaxy S24","Accessories/Phone Cases/Samsung Galaxy S24"
"Apple AirPods Pro","Headphones/Apple/AirPods Pro"
-
name(required): Product name/title- Can be in different languiges
- Should be descriptive and clear
- Example: "iPhone 15 Pro Max silicone case"
-
fullGroupName(required): Full category path- Hierarchical structure with "/" separators
- Should be consistent across similar products
- Example: "Accessories/Phone Cases/Apple iPhone 15 Pro Max"
- Minimum 3 products per category (system requirement)
- Consistent naming for similar categories
- No empty values in required columns
- UTF-8 encoding for text support
- Training: 10,000+ products for good results
- Categories: 100+ different categories
- Balance: At least 5-10 products per category
Place your CSV file in the project root directory:
ai-product-classifier/
βββ your_products.csv β Place your data here
βββ main.py
βββ testing_model.py
βββ requirements.txt
# Ensure your CSV file is in the project directory
python main.pypython testing_model.pyWith properly formatted data, the system typically achieves:
- Accuracy: 85-95% (depends on data quality)
- Training Time: 15-30 seconds (depends on data size)
- Categories: Automatically detects from your data
- Test Samples: 20% of your data (automatically split)
The model automatically:
- Loads and validates your product data
- Creates text embeddings using multilingual transformers
- Trains neural network with early stopping
- Evaluates performance and saves results
keras-tensorflow-product-classifier/
βββ main.py # Main training script
βββ testing_model.py # Model testing script
βββ requirements.txt # Dependencies
βββ plots/ # Visualization graphs
βββ .gitignore # Git ignore rules
βββ README.md # This file
- Embedding Model: paraphrase-multilingual-MiniLM-L12-v2
- Neural Network: 384 β 192 β output neurons
- Regularization: L2 + Dropout
- Optimizer: AdamW with learning rate scheduling
- Train/Test Split: 80%/20%
- Stratification: Preserves category proportions
- Minimum Category Size: 3+ products per category
The system provides detailed classification reports including:
- Per-category precision, recall, and F1-score
- Confusion matrix analysis
- Training history visualization
- Error analysis for difficult categories
Model accuracy and loss during training epochs
Time distribution across training stages
Best performing product categories
Classification performance heatmap
- Web interface for easy testing
- Real-time classification API
- Support for new product categories
- Performance optimization
- Multi-language support expansion
License Type: Open Source / Free Software
This project is created for educational and demonstration purposes and is available under open source terms.
Note: This system demonstrates modern ML techniques for product classification. Users must provide their own training data in the specified format.