Skip to content

BOAZ-mini-2/OOD-Sentiment-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

122 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OOD-Sentiment-LLM

LLM-Based OOD Detection in Sentiment Classification

git clone https://github.com/BOAZ-mini-2/OOD-Sentiment-LLM.git
cd OOD-Sentiment-LLM

Our workflow draft

Image

Load dataset

file_names = [
    "src/dataset/All_Beauty.jsonl_25k.jsonl",
    "src/dataset/Baby_Products.jsonl_25k.jsonl",
    "src/dataset/Grocery_and_Gourmet_Food.jsonl_25k.jsonl",
    "src/dataset/Industrial_and_Scientific.jsonl_25k.jsonl"
]

Scoring method (MSP / E / MD)

OOD-Sentiment-LLM/
├─ ood_scoring/
│  ├─ __init__.py
│  ├─ scoring.py
│  └─ examples/
│     ├─ demo_fake_data.py   # test scoring
│     └─ demo_2.py           # test scoring + eval metric

$score_{MSP}(x) = 1 - \max_{y \in \mathcal{Y}} p(y \mid x)$

$score_{E}(x) = -T \times \log\left[\sum_{y \in \mathcal{Y}}\exp\left(\frac{p(y \mid x)}{T}\right)\right]$

$score_{MD}(x, D_{tr}) = \sqrt{(x - \mu)^\top\Sigma^{-1}(x - \mu)}$

→ Higher scores mean OOD.

# test code
# python -m ood_scoring.examples.demo_fake_data
# python -m ood_scoring.examples.demo_2

from ood_scoring import (
    score_msp,
    score_energy_from_probs,
    fit_md,
    score_md
)

msp = score_msp(probs)
energy = score_energy_from_probs(probs)
mu, inv_cov = fit_md(train_feats)
md_scores = score_md(test_feats, mu, inv_cov)

Scoring method (MSP / E / MD)

OOD-Sentiment-LLM/
├─ check_perform/
│  └─ DMResult.py

- AUROC (Area Under ROC Curve) → Higher is better.
- AUPR (Area Under Precision-Recall Curve) → Higher is better.
- FPR95 (False Positive Rate @ 95% TPR) → Lower is better.

About

LLM-Based OOD Detection in Sentiment Classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors