Skip to content

image classification on CIFAR-10 with ResNet, medical image analysis on breast histopathology images using CNNs, and image captioning on Flickr8k, Flickr30k, and MSCOCO datasets with advanced architectures like LSTM and attention mechanisms.

Notifications You must be signed in to change notification settings

mmahdin/CI_CNNProject_Fall2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Image Processing and Captioning

Project Overview

This project explores the application of computational intelligence techniques in computer vision through three distinct phases. Each phase introduces new challenges and tasks related to image classification, medical image analysis, and image captioning.


Phase 1: Image Classification with Deep Neural Networks

In this phase, a deep neural network (DNN) is implemented to classify images from the CIFAR-10 dataset. The dataset consists of 60,000 32x32 color images across 10 categories, which presents challenges due to low resolution and visual complexity.

Key Details:

  • Architecture: Custom implementation of ResNet from scratch in a modular manner.
  • Optimization: Adjusted the number of blocks in each stage and the number of stages to improve accuracy.
  • Results: Achieved 94.29% accuracy on the test set. result

Phase 2: Medical Image Analysis with CNNs

This phase involves using convolutional neural networks (CNNs) to analyze breast histopathology images for cancer detection. The goal is to explore classification effectiveness and address data challenges.

Key Details:

  • Data Analysis: Comprehensive Exploratory Data Analysis (EDA) to understand data distribution and identify challenges.
  • Architectures: Experimented with various ResNet architectures and implemented strong data augmentation to combat overfitting and address class imbalance.
  • Results: Achieved an F1 score of 90.02% on the test set. saliency

Phase 3: Image Captioning with Deep Learning Models

In the final phase, the task is to generate descriptive captions for images using a combination of CNNs for feature extraction and RNNs for text generation.

Key Details:

  • Architectures Explored: RNN, LSTM, Attention LSTM, self-attention mechanisms, and multi-head self-attention.
  • Feature Extraction: Utilized CNN models such as MobileNet, ResNet50, and EfficientNetB0.
  • Embeddings: Implemented BERT and GloVe embeddings.
  • Reward Strategies: Used BLEU scores for reward-based training.
  • Datasets: Trained models on Flickr8k, Flickr30k, and 40k images from the MSCOCO dataset.
  • Implementation: All modules, including attention mechanisms and LSTM models, were implemented from scratch.
  • Results Tracking: Comprehensive trial and error process documented in commit history. 1.jpg 11.jpg 4.jpg 44.jpg 2.jpg 3.jpg 5.png

About

image classification on CIFAR-10 with ResNet, medical image analysis on breast histopathology images using CNNs, and image captioning on Flickr8k, Flickr30k, and MSCOCO datasets with advanced architectures like LSTM and attention mechanisms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published