Skip to content

This repository contains notebooks, worked on various datasets as a part of my internship.

Notifications You must be signed in to change notification settings

vi-hna-ja/ML-Notebooks

Repository files navigation

Machine Learning Notebooks

I interned at DataFlair for a period of nearly 60 days. I did a total of 10 projects in Computer Vision, NLP and in Machine Learning.

This repository contains all of the notebooks along with the links to the blogs explaining each of the projects I did. All of the projects were done using TensorFlow and Keras API. Some of them were not published due to my period as an intern was terminated. So, the links will head to a google document explaining my project.

The other two notebooks are Image captioning and Handwritten Text Generation.

  • Dataset can be downloaded from here
  • Utilized Transfer Learning to build the classifier
  • Achieved 96.5% accuracy on the test set
  • Link to the article
  • Dataset can be downloaded from here
  • Data suffers from a serious problem of class-imbalance. Only 0.17% of the transactions were fraud.
  • If we don't address the above problem, the model will easily get high accuracy and will not perform well in the real world.
  • Used SMOTE to perform Oversampling.
  • Applied Random Forest and Decision Trees algorithms and visualized the results.
  • It turns out that the Random Forest algorithm with Oversampling performs even better than the other two models.
  • Achieved more than 99% on the accuracy, precision, recall and F1-score metrics
  • Link to the article
  • Dataset can be downloaded from here
  • Built a basic recommendation system using the IMDb weighted average score. This is called Demographic Filtering
  • Demographic Filtering only gives the top results of all time. The results are not personalized.
  • Built a Content Based Recommendation System that recommends movies which are nearer to a movie's plot.
  • Used TfidfVectorizer to represent a matrix where each column represents a word in the overview vocabulary (all the words that appear in at least one document) and each row represents a movie
  • Used cosine similarity scores to calculate the similarity of plots (documents).
  • The above system does well but it can be further improved by using the metadata of the movies.
  • Used the cast, crew and keywords features to improve the model's recommendations.
  • Link to the article
  • Dataset can be downloaded from here
  • The text is preprocessed by removing all the letters except the alphabets and further applying Lemmatization.
  • Applied MultiNomial Naive Bayes algorithm to train the model on the corpus.
  • Achieved more than 95% accuracy on both the training and test sets.
  • Link to the article
  • This is really a cool project I did over the course of my internship.
  • The text is preprocessed and then the word embedding matrix is computed using the pre-trained glove vector 6B 50D.
  • Used LSTMs with Dropout mechanism to improve the accuracy. The architecture of the model consisted of 2 LSTM layers followed by a Dense layer.
  • Achieved only 62% accuracy on the test sets. There is a room for improvement.
  • Link to the article
  • I've always thought this project would be so easy as this is considered as the "Hello World" in the NLP projects.
  • But, I learned a lot while doing this project and this introduced me to so many techniques.
  • Dataset can be downloaded from here
  • The text is preprocessed by applying Tokenizer() on the corpus, then assigned each unique word a unique number and replacing each word with its assigned unqiue number.
  • The text is padded with zeroes to make the dimesions uniform for each text.
  • Used LSTMs with Dropout mechanism to improve the accuracy to train the model on the dataset.
  • Obtained an accuracy of 94% on the validation set.
  • Link to the article
  • Prepared my own dataset for this task.
  • Utilized OpenCV and face_recognition library to detect and recognize faces in the picture.
  • Applied Histogram of Oriented Gradients to the images whose output is then fed into Support Vector Machine to detect pedestrains.
  • The script contains the code to detect the pedestrians in both images and real-time video.
  • Link to the article
  • Prepared my own dataset for this task.
  • Utilized OpenCV and face_recognition library to detect and recognize faces in the picture.
  • Link to the article
  • Prepared my own dataset for this task.
  • Utilized OpenCV and face_recognition library to detect and recognize faces in the picture.
  • Applied Histogram of Oriented Gradients to the images whose output is then fed into Support Vector Machine to detect pedestrains.
  • The script contains the code to detect the pedestrians in both images and real-time video.
  • Link to the article

About

This repository contains notebooks, worked on various datasets as a part of my internship.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published