Does the project structure match the setup outlined in the README, with organized folders for data, models, utilities, and documentation?
Does the README provide a comprehensive overview, including installation, setup instructions, and an explanation of the project’s objectives and usage?
Is a requirements.txt file included with all dependencies and specific library versions required to run the project?
Does the Jupyter notebook (VisionTrack_Analysis.ipynb) include EDA showcasing data distribution, object detection samples, and preprocessing methods?
Is the dataset loaded and preprocessed to remove anomalies, handle missing values, and prepare video/image frames for object detection and tracking?
Does data preprocessing include resizing and normalization, ensuring compatibility with YOLO model input formats?
- Validation of YOLO-compatible annotations (.txt files with class, x, y, w, h).
- Confirm frames are resized and normalized properly before inference.
Is the YOLO model implemented for person detection with configuration options for detection thresholds and class-specific tuning?
Is the Transfer learning applied to adjust the pre-trained YOLO model to specific datasets for improved accuracy?
Is the supervision library correctly integrated to handle object tracking, maintain unique IDs, and count objects within the video stream?
Is the system capable of managing and processing multiple video streams at the same time, displaying real-time tracking and counting results for each?
Does the implementation ensure unique IDs for tracked individuals and handles state management across video streams to prevent ID mismatches?
Does the project include logic for tracking and counting entries and exits within specified regions of interest (ROIs)?
Is the Streamlit app implemented to display video feeds with overlaid detection, tracking, and counting information?
Does the app include interactive elements for switching between streams, adjusting detection thresholds, and customizing tracking parameters?
Is video rendering optimized using OpenCV and Streamlit, providing real-time overlays and maintaining performance?
Does the project use CUDA and GPU acceleration for processing, ensuring efficient handling of real-time video inputs?
Are techniques like model pruning and quantization implemented to enhance inference speed and minimize latency?
Is the Streamlit app tested with various video resolutions to ensure efficient processing without significant performance drops?
Does the Streamlit app provide clear visual overlays on video streams, showing tracking IDs and counting results in real time?
Does the app include performance metrics, such as FPS and processing time, for user insight into real-time processing efficiency?
Are evaluation metrics presented, showcasing precision, recall, and F1-score to assess the effectiveness of detection and tracking?
-
Require metrics file:
reports/performance_metrics.json -
Validate JSON includes:
{ "detection_precision": ..., "detection_recall": ..., "f1_score": ..., "average_fps_per_stream": ..., "average_latency_ms": ... } -
Add minimum thresholds:
Precision ≥ 0.85
Recall ≥ 0.80
F1 ≥ 0.85
FPS ≥ 15 (720p)
-
Add check that metrics are visible in Streamlit dashboard (FPS + latency shown live).