Disaster-Tweets-Detection-BERTweet-LightGBM-Ensemble

Concise, production-minded pipeline for classifying disaster-related tweets.

Kaggle notebook : TweetNLP Pipeline — EDA, BERTweet & LightGBM ensemble

Project Goal and About the Competition

Detect whether a tweet describes a real-world disaster event (binary classification). The model is developed for the Kaggle challenge "nlp-getting-started", aiming for robust generalization from cross-validated training to the competition test split.

Data

Files

train.csv — training set
test.csv — test set
sample_submission.csv — sample file in submission format

Columns

id — unique identifier for each tweet
text — tweet content
location — reported location (may be blank)
keyword — extracted keyword (may be blank)
target — (train only) 1 if tweet describes a disaster, else 0

Pipeline

Minimal tweet-aware cleaning → BERTweet fine-tuning with Stratified K-Fold (OOF probs) → TF–IDF + engineered features → LightGBM (stacking with OOF) → Weighted ensemble (default 0.7 BERTweet / 0.3 LGBM).

Key outcomes

Kaggle private LB: 0.84462 — Rank: 16

Architecture

Base model: vinai/bertweet-base (fine-tuned for sequence classification)
Tokenizer: BERTweet tokenizer (normalization enabled)
Stacking: OOF probabilities from BERTweet appended to TF–IDF + engineered feature matrix
Meta model: LightGBM (gradient-boosted trees)
Ensemble: Weighted average of BERTweet and LightGBM probabilities

My approach

Aiming for a reproducible, competitive pipeline that balances contextual modeling and interpretable features:

Preserve Twitter artifacts (hashtags, mentions, emojis) during cleaning
Use Stratified K-Fold to produce leakage-free stacking features (OOF)
Combine deep contextual signals (BERTweet) with lexical/statistical features (TF–IDF, counts)
Optimize ensemble weighting to control the precision/recall trade-off

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
tweetnlp-pipeline-eda-bertweet-lightgbm-ensemble.ipynb		tweetnlp-pipeline-eda-bertweet-lightgbm-ensemble.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Disaster-Tweets-Detection-BERTweet-LightGBM-Ensemble

Project Goal and About the Competition

Data

Pipeline

Key outcomes

Architecture

My approach

Tags

About

Uh oh!

Releases

Packages

Languages

hamideh-gholipour/Disaster-Tweets-Detection-BERTweet-LightGBM-Ensemble

Folders and files

Latest commit

History

Repository files navigation

Disaster-Tweets-Detection-BERTweet-LightGBM-Ensemble

Project Goal and About the Competition

Data

Pipeline

Key outcomes

Architecture

My approach

Tags

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages