GitHub - rjaditya-2702/CS6120-NLP

Usage

Clone the repository
Install the following dependencies:
- NumPy
- torch
- Sklearn
- transformers
- evaluate
- nnsight
- PySpark
- Pandas
- matplotlib
- seaborn

Data:

Twitter Emotion Dataset: https://www.kaggle.com/datasets/adhamelkomy/twitter-emotion-dataset/data

To get the model:

The models are available here - https://drive.google.com/drive/folders/1D7eiUCLUJFTEy0FTlyFbQirYR2dbRhTw?usp=drive_link .

There are two folders -

model/ - used for all predictions, run tests, etc.
model_with_tokenizer/ - used primarily in token_analysis

Ensure the two models are inside code/

To run the head masking experiment:

Run head_masking_samples.py to obtain the samples (the results from this are stored in data/samples if you do not want to install PySpark and run the file.)
Run head_masking.py to process the sentences obtained from step1 on each variant. The results are stored as npy files in the execution directory under the names head_masking_class_prob_diff.npy and multi_head_masking_class_prob_diff.py

To run the word flipping experiment:

Connect to a python/ ipynb kernel that has the libraries mentioned in the requirements.
Run the first five cells of code/token_analysis.ipynb
The remainder of the notebook compares the heatmaps of two sentences. Feel free to experiment by giving custom sentences.

To run the word replacement experiment:

Run common_words.py to obtain the 10 most common words per class and replace them with synonyms (the results from this are stored in data/replaced_words_text.csv if you do not want to install PySpark and run the file.)
Run the cells of code/synonym_analysis.ipynb to get the probability of the original and synonym-replaced sentences.

Nonstandard dependencies:

PySpark for sample collection

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
code		code
docs		docs
output_logs		output_logs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Data:

To get the model:

To run the head masking experiment:

To run the word flipping experiment:

To run the word replacement experiment:

Nonstandard dependencies:

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

rjaditya-2702/CS6120-NLP

Folders and files

Latest commit

History

Repository files navigation

Usage

Data:

To get the model:

To run the head masking experiment:

To run the word flipping experiment:

To run the word replacement experiment:

Nonstandard dependencies:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages