Models used: VIT and CLIP
Vit embeddings: original image embeddings no transformation
Vit embeddings: Black and white image embedding
Clip embeddings: Cropped in by 10 percent from each side
Clip embeddings: Original Image with increase in hue,brightness ,saturation, contrast
Clip embeddings: Original Image with reduced hue,brightness ,saturation, contrast
To increase metric score of the model slightly
For actual business consideration only 1 or 2 embeddings combination should suffice
As Meesho will provide this service for free, VIT model should be prefered (15-20 MIN KAGGLE GPU - 1 Lakh images)
For Sarees category specifically
To fill null values of train data with most frequent values of same saree
Used product Id code to group same sarees
Used Easy ocr to get text from image Data
Altough this Image text was just use to handle Null values
Data leakage will still be present in the image embeddings which will be used for final predictions
For acutal business use we need to remove this text product id from images
The metric score is not relevant for Saree Category in actual business sense
This code uses all the previous generated embeddings merged together along with pre processed data
The attributes values are labeled as numbers
A better method would be to use Label encoder, i wanted to experiment with regression models too,so used attribute dictionary
Trained seperate lgbm models for each category and each attribute seperatly [ reduced training time by 1/4 due to less possible choices to choose from]
Ran a lgbm model to get most important features and drop others
then used 5 stratified k fold strategy to get model scores and combined predictions
Converted the labels back to categorical values
Very brief approach
Embeddings -> Boosting model (LGBM)-> result
Sorry for the bad documentation and the code
Would try to edit both the code and documenation to make it more readable, but a bit busy right no
Feel free to ask me if you require anything