Skip to content

Official implementation of "ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks"

License

Notifications You must be signed in to change notification settings

boschresearch/proxyprompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks

🔥 Official implementation of "ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks"

arXiv

overview


Table of Contents


Overview

This repository contains the implementation of "ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks," a novel defense mechanism that prevents prompt leakage by replacing the original prompt with a proxy.


Getting Started

Follow the steps below to set up the required Python environment and dependencies:

  1. Create a Conda Environment from environment.yml

    conda env create -f environment.yml
  2. Activate the Environment

    conda activate proxy_prompt
  3. Configure API Keys

    Create a .env file in the project root and add your tokens:

    # Hugging Face Token (required)
    HF_TOKEN=your_huggingface_token_here
    
    # OpenAI Configuration (optional, for certain features)
    OPENAI_ENDPOINT=your_endpoint_here
    OPENAI_API_KEY=your_api_key_here
    DEPLOYMENT_NAME=your_deployment_name

Usage

The project provides example scripts in the scripts/ folder to run the complete ProxyPrompt pipeline:

  • scripts/custom_prompt/ - Protect your own custom prompts with ProxyPrompt defense
  • scripts/paper_prompt/ - Reproduce the results from the paper

Each script contains a complete pipeline from data collection to evaluation.


Citation

If you find our work useful, please star this repo and cite:

@article{zhuang2025proxyprompt,
    title={ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks},
    author={Zhuang, Zhixiong and Nicolae, Maria-Irina and Wang, Hui-Po and Fritz, Mario},
    journal={arXiv preprint arXiv:2505.11459},
    year={2025}
}

Data source

Attack data in attacks/5-gram-175-attacks.json sourced from:

  1. Zhang et al. "Effective prompt extraction from language models." COLM 2024.
  2. Hui et al. "Pleak: Prompt leaking attacks against large language model applications." CCS 2024.
  3. Wang et al. "Raccoon: Prompt extraction benchmark of llm-integrated applications." ACL 2024.
  4. Liang et al. "Why are my prompts leaked? Unraveling prompt extraction threats in customized large language models." arXiv:2408.02416, 2024.

See the paper for details on system prompt sources.

License

This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication cited above.

Contact

Please feel free to open an issue or contact personally if you have questions, need help, or need explanations. Don't hesitate to write an email to the following email address: [email protected]

About

Official implementation of "ProxyPrompt: Securing System Prompts against Prompt Extraction Attacks"

Topics

Resources

License

Stars

Watchers

Forks