Skip to content

NLP-based text classification project using Gradient Boosting, TF-IDF, and CountVectorizer. Optimized with GridSearchCV and evaluated with cross-validation.

Notifications You must be signed in to change notification settings

kemaltf/nlp_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NLP Text Classification with Gradient Boosting

This project focuses on classifying text data using Natural Language Processing (NLP) techniques and machine learning models. It explores the effectiveness of TF-IDF and CountVectorizer for feature extraction and utilizes Gradient Boosting as the primary classifier. Model optimization is conducted using GridSearchCV with k-fold cross-validation.

🧠 Objectives

  • Preprocess and vectorize text data
  • Compare feature extraction methods (TF-IDF vs CountVectorizer)
  • Train and optimize a Gradient Boosting Classifier
  • Evaluate model performance using cross-validation

πŸ” Features

  • Text preprocessing: tokenization, stopword removal, lowercasing
  • Feature extraction using:
    • TF-IDF Vectorizer
    • CountVectorizer
  • Model training using:
    • GradientBoostingClassifier
  • Hyperparameter tuning using:
    • GridSearchCV
  • Model evaluation using:
    • k-Fold Cross-Validation

πŸ“¦ Libraries Used

  • Python 3.x
  • scikit-learn
  • pandas
  • numpy
  • matplotlib / seaborn (for visualization)
  • nltk (optional for preprocessing)

πŸ§ͺ Model Evaluation

Evaluation metrics used:

  • Accuracy
  • Precision
  • Recall
  • F1-Score
  • Confusion Matrix
  • Cross-validation scores

πŸš€ How to Run

  1. Clone the repository:
    git clone https://github.com/yourusername/nlp-text-classification.git

About

NLP-based text classification project using Gradient Boosting, TF-IDF, and CountVectorizer. Optimized with GridSearchCV and evaluated with cross-validation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published