Skip to content

NOVA IMS MSc Text Mining (2025). Stock market sentiment from tweets (bearish=0, bullish=1, neutral=2). Includes EDA, preprocessing, TF-IDF baselines, and a DistilBERT final model. Final prediction file (pred_19.csv) included.

License

Notifications You must be signed in to change notification settings

DiogoGAndrade/TextMining_Stock_Sentiment_Group19

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock Sentiment from Tweets — TM 2025 (Group 19)

NOVA IMS — MSc in Data Science & Advanced Analytics · Text Mining 2025

Predict market sentiment (bearish=0, bullish=1, neutral=2) from tweets about stocks. We benchmark classic ML baselines (Count/TF-IDF + linear models) and a transformer encoder, selecting DistilBERT as the final model (best Macro-F1).

Task & Data

  • Train: tweets with text and label (0/1/2) · Test: tweets with text only.
  • Deliverables: tm_tests_19.ipynb, tm_final_19.ipynb, pred_19.csv, and a ≤10-page PDF report.
    (See the official handout for details and deadline.)

Approach

  • EDA: label distribution, token length, top words/bigrams after normalization.
  • Preprocessing: regex cleaning, lowercasing, stopword removal, lemmatization/stemming.
  • Features: TF-IDF (unigrams–trigrams), transformer embeddings (DistilBERT).
  • Models: LinearSVC / Logistic Regression baselines; DistilBERT classifier (final).
  • Evaluation: Accuracy, Precision, Recall, Macro-F1 on validation.

Results (Macro-F1)

  • DistilBERT (Transformer): 0.82 ← final model
  • TF-IDF (1–3) + LinearSVC: 0.73
  • TF-IDF (1–2) + LinearSVC: 0.73
    (See reports/All_Models_MacroF1_Table.docx for the full table.)

How to reproduce

Open notebooks/tm_final_19.ipynb and run all cells to produce submission/pred_19.csv.

Repository layout

notebooks/ — notebooks (tm_tests_19.ipynb, tm_final_19.ipynb)
reports/ — Text Mining Report.pdf, All_Models_MacroF1_Table.docx, Project Handout TM 2025 v2.pdf
submission/ — pred_19.csv, optional metrics

Authors — Group 19

André Oliveira · Diogo Andrade · Francisco Pontes

About

NOVA IMS MSc Text Mining (2025). Stock market sentiment from tweets (bearish=0, bullish=1, neutral=2). Includes EDA, preprocessing, TF-IDF baselines, and a DistilBERT final model. Final prediction file (pred_19.csv) included.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published