LLM-Based OOD Detection in Sentiment Classification
git clone https://github.com/BOAZ-mini-2/OOD-Sentiment-LLM.git
cd OOD-Sentiment-LLM
file_names = [
"src/dataset/All_Beauty.jsonl_25k.jsonl",
"src/dataset/Baby_Products.jsonl_25k.jsonl",
"src/dataset/Grocery_and_Gourmet_Food.jsonl_25k.jsonl",
"src/dataset/Industrial_and_Scientific.jsonl_25k.jsonl"
]OOD-Sentiment-LLM/
├─ ood_scoring/
│ ├─ __init__.py
│ ├─ scoring.py
│ └─ examples/
│ ├─ demo_fake_data.py # test scoring
│ └─ demo_2.py # test scoring + eval metric
→ Higher scores mean OOD.
# test code
# python -m ood_scoring.examples.demo_fake_data
# python -m ood_scoring.examples.demo_2
from ood_scoring import (
score_msp,
score_energy_from_probs,
fit_md,
score_md
)
msp = score_msp(probs)
energy = score_energy_from_probs(probs)
mu, inv_cov = fit_md(train_feats)
md_scores = score_md(test_feats, mu, inv_cov)OOD-Sentiment-LLM/
├─ check_perform/
│ └─ DMResult.py- AUROC (Area Under ROC Curve) → Higher is better.
- AUPR (Area Under Precision-Recall Curve) → Higher is better.
- FPR95 (False Positive Rate @ 95% TPR) → Lower is better.