PerAugy: Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers

This repository provides a step-by-step pipeline for preparing and processing the PENS dataset for sequential recommendation tasks using synthetic user interaction graphs (UIGs).

📦 Dataset

PENS Dataset: Includes train, validation, test, and news sets.

⚙️ Processing Steps

1. Download the PENS Dataset

Start by downloading the PENS dataset, which contains the following files:

train.json
val.json
test.json
news.json

Ensure these files are placed in the appropriate directory for further processing.

2. Arrange Dataset into Seed UIG Format

Run the script:

python Scripts/PENS_augmentation.py

This script:

Sorts user interactions by timestamp.
Appends summary nodes from the test set.
Generates a Seed UIG (user interaction graph) structure.

The following file will be created:

synthetic-original.csv – the initial synthetic UIG dataset.

Additionally, a corresponding summary dataset will be generated with metadata about the summary nodes for each synthetic user.

3. Create Double Shuffled Trajectories

Open and run:

DS/DoubleShuffling.ipynb

This notebook generates Double Shuffled (DS) user trajectories. You can experiment with:

offset
gap
segment length
other relevant hyperparameters

4. Apply History-Influenced Perturbation

Run the following notebook to refine the DS dataset:

perturbation/perturbation_D2.ipynb

This step smooths the shuffled trajectories by applying history-aware perturbations to improve recommendation dynamics.

5. Convert to PENS Sequential Format

Run:

KG2PENS/KG2PENS_Trainer_New_Convertor.ipynb

This notebook:

Replaces <d-s> pairs with corresponding s-nodes only.
Converts the processed dataset into PENS-compatible knowledge graph format.

The final output format will be:

UserID, ClickedNewsID, PositiveNewsID, NegativeNewsID

📁 Output

After completing all steps, you’ll obtain:

A PENS-style dataset ready for sequential recommendation experiments.
Summary statistics and user graph structures aligned with experimental design.

📌 Notes

Make sure all dependencies are installed before running the notebooks.
Intermediate datasets are saved automatically after each step.
You can tweak parameters in the Jupyter notebooks for different experimental settings.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
DS		DS
KG2PENS		KG2PENS
expm		expm
openAI-reddit		openAI-reddit
openai_reddit_cnndm_scripts		openai_reddit_cnndm_scripts
perturbation		perturbation
scripts		scripts
PC		PC
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PerAugy: Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers

📦 Dataset

⚙️ Processing Steps

1. Download the PENS Dataset

2. Arrange Dataset into Seed UIG Format

3. Create Double Shuffled Trajectories

4. Apply History-Influenced Perturbation

5. Convert to PENS Sequential Format

📁 Output

📌 Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

KDM-LAB/PerAugy-2025-public

Folders and files

Latest commit

History

Repository files navigation

PerAugy: Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers

📦 Dataset

⚙️ Processing Steps

1. Download the PENS Dataset

2. Arrange Dataset into Seed UIG Format

3. Create Double Shuffled Trajectories

4. Apply History-Influenced Perturbation

5. Convert to PENS Sequential Format

📁 Output

📌 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages