A simple NLP project of resume processing using python 3.0.
Wrong format included:
- URLs
- hashtags
- mentions
- special letters
- puctuations
Left column is original dataset which contains lots of wrong format informations.
Right column is the resume dataset after cleaned.
Its easily to read that "Details" appeared 484 times,"Experience" 446 times, as well as "company", "less", "year", "Machine Learning", and etc. These are those most numbers of words appeared in one resume.
Train maching learning model for resume processing and here is the classification report of this dataset
Here I used the onevsrest classifier and KNN classifier.
First, split the data into training and data sets.