ContextLeak is an open-source implementation of the framework presented in the paper “ContextLeak: Auditing Leakage in Private In-Context Learning Methods.” The project provides researchers with an end-to-end toolkit to measure, visualise and mitigate worst-case privacy leakage in In-Context Learning (ICL) scenarios.
The project implements a framework for:
- Generating and inserting canary statements into training data
- Running LLM inference with various configurations
- Evaluating model responses with privacy metrics
- Supporting multiple datasets and model architectures
- SAMSum: Dialogue summarization dataset
- DocVQA: Document Visual Question Answering dataset
- Subjectivity: Text classification for objective vs. subjective sentences
- Sarcasm: Sarcasm detection in news headlines
- LLaMA 70B
- Qwen 72B
- Install dependencies:
pip install -r requirements.txt- Prepare your data:
- Place your datasets in the appropriate directories under
data/ - For DocVQA: Place data in
data/processed/docvqa_sampled - For SAMSum: Place data in
data/processed/samsum_sampled_{num_train}_train - For Subjectivity: Place data in
data/classification/subj_sampled_{num_train}_train - For Sarcasm: Place data in
data/classification/sarcasm_sampled_{num_train}_train
- Prepare canary statements:
- Place canary files in
data/canaries/ - Format: Pickle files containing canary statements
python generate.py \
--dataset_name [samsum|docvqa|subj|sarcasm] \
--exampler <number_of_shots> \
--ensemble <number_of_ensembles> \
--llm [llama70b|qwen72b] \
--canary_file <path_to_canary_file> \
--output_dir <output_directory> \
--num_train <number_of_training_samples> \
--test_num <number_of_test_samples>Add the --audit flag to enable auditing mode:
python generate.py \
--dataset_name docvqa \
--exampler 2 \
--ensemble 10 \
--llm qwen72b \
--canary_file ./data/canaries/incorrect_statements_docvqa.pkl \
--output_dir ./data/audit \
--audit \
--num_train 20 \
--test_num 400Add the --zero_shot flag for zero-shot evaluation:
python generate.py \
--dataset_name docvqa \
--exampler 0 \
--ensemble 10 \
--llm qwen72b \
--canary_file ./data/canaries/incorrect_statements_docvqa.pkl \
--output_dir ./data/audit \
--audit \
--zero_shot \
--num_train 20 \
--test_num 400The script generates two types of output files:
-
Model Predictions:
- Location:
<output_dir>/<llm>_<shots>shot_<ensemble>ensemble_<canary>canary.jsonl - Format: JSONL file containing model predictions and prompts
- Location:
-
Canary Selection (in audit mode):
- Location:
<output_dir>/canary_selection_<llm>_<shots>shot_<ensemble>ensemble_1canary.pkl - Format: Pickle file containing canary selection information
- Location:
.
├── data/
│ ├── canaries/ # Canary statement files
│ ├── processed/ # Processed datasets
| ├── audit/ # Audit outputs (auditing and zero-shot mode)
│ └── classification/ # Classification datasets
├── output/ # Output directory
│ ├── results/ # Evaluation results
│ └── private_output/ # Private predictions
├── generate.py # Main generation script
└── utils/ # Utility functions
- audit/
- Contains scripts used for running audits
- calc_utility/
- Scripts for computing utility metrics (e.g., accuracy, ROUGE) across datasets and
- dpicl/
- Implementation of Differentially Private In-Context Learning (DP-ICL) mechanisms, such as ESA, RNM
- experiments/
- Scripts for experiments, such as worst-case and average-case auditing under system prompt defenses