This project implements a production-style AI pipeline to detect and group similar retail products from shelf images.
It is built using a microservice architecture with clear separation of responsibilities:
- Detector Service โ detects product bounding boxes
- Grouping Service โ groups visually + spatially similar products
- Gateway Service โ orchestrates services and provides UI
Client (UI)
โ
Gateway Service (Flask)
โ
Detector Service (YOLO)
โ
Grouping Service (ResNet + Clustering)
โ
Final Output (Image + JSON)
- Handles image upload
- Orchestrates detector โ grouping
- Returns final JSON + image
- Serves output images
-
Uses YOLOv8
-
Implements:
- Sliding window detection
- Low confidence threshold
- Area filtering
- Non-Max Suppression (NMS)
- Aspect ratio filtering
- Duplicate removal
-
Outputs:
- Bounding boxes
- Cropped images
-
Uses ResNet18 embeddings
-
Combines:
- Visual features
- Spatial features
-
Applies:
- Feature normalization
- Agglomerative clustering
-
Outputs:
- Group IDs
- Annotated image
- Microservices โ modular & scalable
- Sliding window detection โ improves recall
- Spatial + visual fusion โ better grouping
- Config-driven system โ no hardcoding
- Logging โ easier debugging
python -m venv venv
venv\Scripts\activatepython3 -m venv venv
source venv/bin/activatepip install -r requirements/requirements-dev.txtpython run_all.py# Terminal 1
cd detector_service
python app.py
# Terminal 2
cd grouping_service
python app.py
# Terminal 3
cd gateway
python app.pyhttp://127.0.0.1:5000
cd docker
docker-compose up --buildhttp://localhost:5000
| Service | Port |
|---|---|
| Gateway | 5000 |
| Detector | 8001 |
| Grouping | 8002 |
- Upload image โ Gateway
- Gateway โ Detector
- Gateway โ Grouping
- Final response returned
{
"request_id": "...",
"output_image": "/outputs/result_xxx.jpg",
"results": [
{
"bbox": [x1, y1, x2, y2],
"group_id": 0
}
]
}project/
โ
โโโ notebooks/
โ โโโ model.ipynb
โโโ gateway/
โโโ detector_service/
โโโ grouping_service/
โโโ logs/
โโโ docs/
โโโ models/
โโโ outputs/
โโโ docker/
โ โโโ docker-compose.yml
โโโ run_all.py
โโโ requirements.txt
โโโ README.md
The notebook (notebooks/model.ipynb) was used during the experimentation phase.
- Prototyping detection pipeline
- Testing slicing strategy
- Developing filtering logic
- Validating grouping approach
- Analyzing clustering behavior
| Notebook | Production |
|---|---|
| Inline code | Microservices |
| Hardcoded values | Config-driven |
| Sequential flow | API-based pipeline |
- Sliding window improves recall
- Area filtering removes noise
- Spatial + visual features improve grouping
- Normalization stabilizes clustering
All parameters are configurable via config.py.
SLICE_SIZE = 512
OVERLAP = 0.4
IOU_THRESHOLD = 0.4
MIN_AREA_RATIO = 0.0005
MAX_AREA_RATIO = 0.03DISTANCE_THRESHOLD = 0.6
SPATIAL_WEIGHT = 0.1
IMAGE_SIZE = 224
CLUSTERING_METRIC = "euclidean"
CLUSTERING_LINKAGE = "average"-
DISTANCE_THRESHOLD
- lower โ more groups
- higher โ fewer groups
-
SPATIAL_WEIGHT
- 0 โ visual only
- 0.1 โ balanced
- higher โ spatial bias
- Increase
DISTANCE_THRESHOLDโ merge clusters - Reduce
SPATIAL_WEIGHTโ visual grouping - Increase
OVERLAPโ better detection
DISTANCE_THRESHOLD=0.7
SPATIAL_WEIGHT=0.2Below is an example demonstrating detection and grouping results from the pipeline.
{
"request_id": "example-id",
"output_image": "/outputs/result_example.jpg",
"results": [
{
"bbox": [100, 200, 300, 400],
"group_id": 2
},
{
"bbox": [320, 210, 500, 390],
"group_id": 2
}
]
}- End-to-end ML pipeline
- Microservice architecture
- UI + API integration
- Config-driven design
- Logging support
- Fine-tuned embeddings
- Better clustering (DBSCAN / metric learning)
- Independent service scaling
This project demonstrates the transition from:
Notebook โ Production-ready ML system
Combining:
- Machine Learning
- Backend Engineering
- System Design
Aman Gupta

