Skip to content

eth-sri/watermark-spoofing-detection

Repository files navigation

Discovering Spoofing Attempts on Language Model Watermarks

Getting started

Setup the environment (pytorch installation might differ depending on your GPU setup):

conda env create --file=env.yaml
conda activate spoofDetect

You also need to install watermark-stealing as a pip package (as instructed below). The code for watermark-stealing was forked from https://github.com/eth-sri/watermark-stealing/tree/main. For using a watermark stealing model, additional files need to be downloaded. Refer to watermark-stealing/README.md.

cd watermark-stealing
pip install -e .

Optionally you may use flash attention: flash-attn (pip)

pip install flash-attn --no-build-isolation

Reproducing the experiments

To reproduce the experiments from the paper, you need to setup a config file for your model. Then run

bash reprompting_pipeline.sh "path to your config" "Y if Learning, N if Stealing" "number of queries" "dataset (either c4 or dolly)" "Y if generating only spoofed text, N if generating both spoofed and xi-watermarked text"

This will generate the text for both Reprompting and Normal method. To then get p-values run:

python generate_pvalues.py --cfg_path "path to your config" --reprompting "Y/N" --dataset "c4/dolly" --token_target "value for T"

The generated p-values will be in .csv files in the data/pvalues folder.

All the config files used for the experiments in the paper can be found in configs/generated.

Additionally, a configuration file generator is available in generate_configs.ipynb.

Contact

Thibaud Gloaguen, [email protected]
Nikola Jovanović, [email protected]
Robin Staab, [email protected]
Martin Vechev

Citation

If you use our code please cite the following.

@misc{gloaguen2025discoveringspoofingattemptslanguage,
      title={Discovering Spoofing Attempts on Language Model Watermarks}, 
      author={Thibaud Gloaguen and Nikola Jovanović and Robin Staab and Martin Vechev},
      year={2025},
      eprint={2410.02693},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2410.02693}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published