This guide covers all scripts and pipelines in the project.
First time setup:
# 1. Create and activate environment
conda create -n overthink python=3.10
conda activate overthink
# 2. Install dependencies
cd final_project
pip install -r requirements.txt
# 3. Install spaCy model
python -m spacy download en_core_web_sm
# 4. (Optional) Install MLX for Apple Silicon baseline
pip install mlx mlx-lmTrain a single algorithm:
cd final_project
python pipeline/train.py --algorithm grpo --dataset 690-Project-Dataset-final.csv --num_iters 300 --batch_size 64 --output_dir modelsTest/Evaluate a trained model:
cd final_project
python pipeline/test.py --algorithm grpo --model models/grpo_model.pt --directive accurately --get-regexTrain and compare all algorithms:
cd final_project
python scripts/compare_all.py --algorithms grpo groupedppo vanillarl --dataset 690-Project-Dataset-final.csv --num_iters 300 --batch_size 64 --output_dir results-
Python Environment: Python 3.7+ (Python 3.10 recommended)
- Using conda (recommended):
conda create -n overthink python=3.10 conda activate overthink
- Or using venv:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Using conda (recommended):
-
Install Core Dependencies:
cd final_project pip install -r requirements.txt -
Install spaCy Language Model (required for PII extraction):
python -m spacy download en_core_web_sm
-
Optional: MLX for Apple Silicon (for baseline minimizer comparison):
pip install mlx mlx-lm
- This enables the MLX baseline minimizer which works on Apple Silicon without a GPU
- If you skip this, the comparison script will run with RL pipeline only
-
Optional: GPU Support (for GPU-based baseline minimizer):
pip install bitsandbytes
- Requires CUDA-capable GPU
- Only needed if you want to use GPU baseline instead of MLX
- Dataset:
690-Project-Dataset-final.csv(finalized dataset) in thefinal_project/directory - Model Path:
models/{algorithm}_model.pt(created after training) - Training Settings:
--num_iters 300 --batch_size 64(recommended)
With Convergence Detection (Recommended):
cd final_project
python pipeline/train.py \
--algorithm grpo \
--dataset 690-Project-Dataset-final.csv \
--convergence_threshold 0.001 \
--patience 20 \
--max_iters 1000 \
--batch_size 64 \
--learning_rate 3e-4 \
--output_dir modelsFixed Iterations (Recommended for tradeoff dataset):
python pipeline/train.py \
--algorithm grpo \
--dataset 690-Project-Dataset-final.csv \
--num_iters 300 \
--batch_size 64 \
--output_dir modelsOutputs:
models/grpo_model.pt: Trained model weightsmodels/grpo_history.json: Training history (iterations, rewards)
python pipeline/train.py \
--algorithm groupedppo \
--dataset 690-Project-Dataset-final.csv \
--num_iters 300 \
--batch_size 64 \
--output_dir modelspython pipeline/train.py \
--algorithm vanillarl \
--dataset 690-Project-Dataset-final.csv \
--num_iters 300 \
--batch_size 64 \
--output_dir models| Parameter | Description | Default |
|---|---|---|
--algorithm |
Algorithm name (grpo/groupedppo/vanillarl) | Required |
--dataset |
Path to dataset CSV | 690-Project-Dataset-final.csv |
--max_iters |
Maximum training iterations | 1000 |
--batch_size |
Batch size for rollouts | 64 |
--learning_rate |
Learning rate | 3e-4 |
--convergence_threshold |
Min improvement to consider progress | 0.001 |
--patience |
Iterations without improvement before stopping | 20 |
--log_every |
Log every N iterations | 10 |
--output_dir |
Output directory for models | models |
Basic Evaluation (with balanced directive):
python pipeline/test.py \
--algorithm grpo \
--model models/grpo_model.pt \
--directive balanced \
--dataset 690-Project-Dataset-final.csv \
--output evaluation_results.json Evaluation with specific directive:
# Strictly directive (high threshold, less utility)
python pipeline/test.py \
--algorithm grpo \
--model models/grpo_model.pt \
--directive strictly \
--dataset 690-Project-Dataset-final.csv
# Accurately directive (low threshold, more utility)
python pipeline/test.py \
--algorithm grpo \
--model models/grpo_model.pt \
--directive accurately \
--dataset 690-Project-Dataset-final.csvOutput: Shows:
- Average reward (if dataset provided)
- Restaurant domain metrics (utility, privacy, breach rate) for the specified directive
- Bank domain metrics (utility, privacy, breach rate) for the specified directive
- Summary table showing utility-privacy tradeoff for ALL directives (strictly, balanced, accurately)
Get regex for a specific directive:
python pipeline/test.py \
--algorithm grpo \
--model models/grpo_model.pt \
--directive accurately \
--get-regexDirectives (with tradeoff dataset):
strictly: High threshold (≥0.7), more privacy, less utility- Bank: Only EMAIL, PHONE → Utility = 0.4
balanced: Default threshold (0.5), balanced tradeoff- Bank: EMAIL, PHONE, DATE/DOB → Utility = 0.6
accurately: Low threshold (≤0.3), more utility, less privacy- Bank: All 5 PII types (EMAIL, PHONE, DATE/DOB, SSN, CREDIT_CARD) → Utility = 1.0
| Parameter | Description | Default |
|---|---|---|
--algorithm |
Algorithm name | Required |
--model |
Path to trained model | Required |
--dataset |
Path to dataset CSV (optional, only for reward evaluation) | 690-Project-Dataset-final.csv |
--directive |
Directive for regex (strictly/balanced/accurately) | balanced |
--get-regex |
Flag to extract regex patterns only | False |
--output |
Output file for results | evaluation_results.json |
Check PII type frequencies in dataset:
python scripts/analyze_dataset_probabilities.py \
--dataset 690-Project-Dataset-final.csvOutput: Table showing:
- Probability of each PII type in restaurant domain
- Probability of each PII type in bank domain
- Differences between domains
Compare utility-privacy tradeoff across directives:
python scripts/analyze_with_directives.py \
--algorithm grpo \
--model models/grpo_model.pt \
--dataset 690-Project-Dataset-final.csvOutput:
- Utility, privacy, and breach rate for each directive
- Comparison table
Get regex pattern for specific directive:
python scripts/get_regex_by_directive.py \
--algorithm grpo \
--model models/grpo_model.pt \
--directive accurately \
--domain bankParameters:
--algorithm: Algorithm name--model: Path to trained model--directive: strictly/balanced/accurately--domain: restaurant/bank
Train and compare all algorithms (recommended):
cd final_project
python scripts/compare_all.py \
--algorithms grpo groupedppo vanillarl \
--dataset 690-Project-Dataset-final.csv \
--num_iters 300 \
--batch_size 64 \
--output_dir resultsCompare specific algorithms:
# Compare only GRPO and GroupedPPO
python scripts/compare_all.py \
--algorithms grpo groupedppo \
--dataset 690-Project-Dataset-final.csv \
--num_iters 300 \
--batch_size 64 \
--output_dir resultsNote: This will:
- Train all specified algorithms on the tradeoff dataset
- Generate utility-privacy tradeoff graphs showing how each directive affects utility/privacy for each algorithm
- Create comparison table with metrics for all directives (strictly, balanced, accurately)
Parameters:
--algorithms: Space-separated list of algorithms (grpo, groupedppo, vanillarl)--dataset: Path to dataset CSV file--num_iters: Training iterations per algorithm (default: 200)--batch_size: Batch size for training (default: 64)--log_every: Log every N iterations (default: 10)--output_dir: Output directory for results (default: comparison_results)
Outputs:
results/training_curves.png: Learning curves for all algorithmsresults/utility_privacy.png: Utility-privacy tradeoff by directive for each algorithm (shows 3 points per algorithm: strictly, balanced, accurately)results/bank_directive_tradeoff.png: Bank domain utility vs privacy across directivesresults/performance.png: Performance comparison bar chartsresults/comparison_table.csv: Numerical comparison table with metrics for all directives:Restaurant Utility (strictly),Restaurant Privacy (strictly)Restaurant Utility (balanced),Restaurant Privacy (balanced)Restaurant Utility (accurately),Restaurant Privacy (accurately)- Same columns for Bank domain
results/results.json: Complete results in JSON format
models/
├── grpo_model.pt # Trained GRPO model
├── grpo_history.json # Training history
├── groupedppo_model.pt # Trained GroupedPPO model
├── groupedppo_history.json # Training history
├── vanillarl_model.pt # Trained VanillaRL model
└── vanillarl_history.json # Training history
evaluation_results.json # Full evaluation metrics
results/
├── training_curves.png # Learning curves for all algorithms
├── utility_privacy.png # Utility-privacy tradeoff
├── performance.png # Performance bar charts
└── comparison_table.csv # Numerical comparison
The integration pipeline takes a third-party prompt and user data, automatically detects the domain, gets allowed PII types from GRPO, and returns minimized data with only allowed PII.
Restaurant Domain Example:
cd final_project
python pipeline/integration_pipeline.py \
--prompt "I need to book a table" \
--data "Hi, my name is John Smith. My email is john@example.com and you can reach me at 555-1234. My SSN is 123-45-6789."Bank Domain Example:
python pipeline/integration_pipeline.py \
--prompt "I need to check my account balance" \
--data "Name: John, Email: john@example.com, Phone: 555-1234, SSN: 123-45-6789, Credit Card: 4111-1111-1111-1111" \
--algorithm groupedppo# Strictly directive (higher privacy)
python pipeline/integration_pipeline.py \
--prompt "Book a table" \
--data "Email: user@example.com, Phone: 555-1234, SSN: 123-45-6789" \
--directive strictly
# Accurately directive (higher utility)
python pipeline/integration_pipeline.py \
--prompt "Check account balance" \
--data "Email: user@example.com, SSN: 123-45-6789" \
--directive accuratelypython pipeline/integration_pipeline.py \
--prompt "Reserve a table" \
--data "Name: Jane, Email: jane@example.com, Phone: 555-5678" \
--json --full-details| Parameter | Description | Default |
|---|---|---|
--prompt |
Third-party prompt/query | Required |
--data |
User data containing PII (raw text) | Required |
--algorithm |
RL algorithm (grpo/groupedppo/vanillarl) | grpo |
--directive |
Privacy directive (strictly/balanced/accurately) | balanced |
--classifier-model |
Path to context classifier model | MLP/context_agent_mlp.pth |
--grpo-model |
Path to GRPO model | models/{algorithm}_model.pt |
--json |
Output as JSON | False |
--full-details |
Return detailed breakdown | False |
The pipeline returns:
- Domain: Detected domain (restaurant/bank) with confidence
- Allowed PII Types: List of PII types allowed for this domain
- Minimized Data (Redacted Text): Original text with disallowed PII replaced with
[REDACTED] - Minimized Data (Structured): Structured format with PII types and values
Example Output:
============================================================
INTEGRATION PIPELINE RESULTS
============================================================
Third-party Prompt: I need to book a table
Detected Domain: RESTAURANT (confidence: 99.17%)
Allowed PII Types: PHONE, EMAIL
Minimized Data (Redacted Text):
Hi, my name is [REDACTED]. My email is john@example.com and you can reach me at 555-1234. My SSN is [REDACTED].
Minimized Data (Structured):
PHONE: 555-1234, EMAIL: john@example.com
============================================================
from pipeline.integration_pipeline import minimize_data
result = minimize_data(
third_party_prompt="I need to book a table",
user_data="Hi, my name is John Smith. My email is john@example.com and you can reach me at 555-1234. My SSN is 123-45-6789.",
algorithm="grpo",
directive="balanced"
)
print(result['minimized_data']) # Redacted text
print(result['minimized_data_structured']) # Structured formatCompare the original AirGap LLM-based minimizer with the RL-based integration pipeline on utility, privacy, and inference speed.
Restaurant Domain:
cd final_project
python pipeline/compare_baseline_vs_rl.py \
--num-samples 5 \
--domain restaurantBank Domain with GroupedPPO:
python pipeline/compare_baseline_vs_rl.py \
--num-samples 10 \
--domain bank \
--rl-algorithm groupedppo \
--output bank_comparison.csvCreate a JSON file test_cases.json:
[
{
"prompt": "I need to book a table",
"user_data": "Hi, my name is John Smith. My email is john@example.com and you can reach me at 555-1234. My SSN is 123-45-6789.",
"allowed_pii": ["PHONE", "EMAIL"],
"domain": "restaurant"
},
{
"prompt": "Check my account balance",
"user_data": "Name: Jane Doe, Email: jane@example.com, Phone: 555-5678, SSN: 987-65-4321",
"allowed_pii": ["PHONE", "EMAIL", "SSN"],
"domain": "bank"
}
]Then run:
python pipeline/compare_baseline_vs_rl.py --test-cases test_cases.json| Parameter | Description | Default |
|---|---|---|
--test-cases |
JSON file with test cases | Auto-generate |
--num-samples |
Number of test samples to generate | 5 |
--domain |
Domain (restaurant/bank) | restaurant |
--baseline-model |
Baseline model name | Qwen/Qwen2.5-7B-Instruct |
--rl-algorithm |
RL algorithm (grpo/groupedppo/vanillarl) | grpo |
--rl-directive |
Privacy directive (strictly/balanced/accurately) | balanced |
--output |
Output CSV file | comparison_results.csv |
The script compares:
- Utility: % of allowed PII correctly shared (higher is better)
- Privacy: % of disallowed PII correctly NOT shared (higher is better)
- Quickness: Inference time in seconds (lower is better)
Console Output: Real-time comparison for each test case and overall summary:
================================================================================
OVERALL SUMMARY
================================================================================
Baseline LLM Minimizer:
Average Utility: 95.0%
Average Privacy: 98.0%
Average Time: 2.123s
Total Time: 10.615s
RL Integration Pipeline (grpo):
Average Utility: 100.0%
Average Privacy: 100.0%
Average Time: 0.015s
Total Time: 0.075s
Comparison:
Utility Difference: +5.0% (better)
Privacy Difference: +2.0% (better)
Average Speedup: 141.5x faster
Total Time Saved: 10.540s
CSV Output: Detailed results with all metrics for each test case:
baseline_utility,baseline_privacy,baseline_timerl_utility,rl_privacy,rl_timeutility_diff,privacy_diff,time_speedup
Baseline LLM Minimizer:
- Apple Silicon / CPU (recommended): install MLX baseline
conda activate overthink pip install mlx mlx-lm
- GPU fallback: CUDA-capable GPU +
bitsandbytes - Large language model (default MLX model:
mlx-community/Qwen2.5-7B-Instruct-4bit) transformerslibrary
RL Integration Pipeline:
- Trained RL model (
models/{algorithm}_model.pt) - Context classifier (
MLP/context_agent_mlp.pth) spacyanden_core_web_smmodel
Note:
- The comparison script automatically tries the MLX baseline first (works on Apple Silicon without a discrete GPU).
- If
mlxis missing you will see “Baseline skipped – not available”; install it with the command above. - Use
--use-gputo force the GPU baseline if you have CUDA andbitsandbytesinstalled.
# Command line - plain output
python scripts/endpoint.py --algorithm grpo --directive balanced --domain bank
# Command line - JSON output
python scripts/endpoint.py --algorithm grpo --directive strictly --domain bank --json
# As a Python function
from scripts.endpoint import get_regex
result = get_regex("grpo", "balanced", "bank")
# Returns: ['PHONE', 'EMAIL', 'DATE/DOB', 'SSN', 'CREDIT_CARD']Features:
- Takes algorithm, directive, and domain as input
- Returns regex as a list of PII types
- Supports JSON output with --json flag
- Can be imported and used as a function
For more details, see:
README.md: Project overviewALGORITHM_EXPLANATION.md: Algorithm detailspipeline/COMPARISON_README.md: Detailed comparison guide