Discovering Spoofing Attempts on Language Model Watermarks

Getting started

Setup the environment (pytorch installation might differ depending on your GPU setup):

conda env create --file=env.yaml
conda activate spoofDetect

You also need to install watermark-stealing as a pip package (as instructed below). The code for watermark-stealing was forked from https://github.com/eth-sri/watermark-stealing/tree/main. For using a watermark stealing model, additional files need to be downloaded. Refer to watermark-stealing/README.md.

cd watermark-stealing
pip install -e .

Optionally you may use flash attention: flash-attn (pip)

pip install flash-attn --no-build-isolation

Reproducing the experiments

To reproduce the experiments from the paper, you need to setup a config file for your model. Then run

bash reprompting_pipeline.sh "path to your config" "Y if Learning, N if Stealing" "number of queries" "dataset (either c4 or dolly)" "Y if generating only spoofed text, N if generating both spoofed and xi-watermarked text"

This will generate the text for both Reprompting and Normal method. To then get p-values run:

python generate_pvalues.py --cfg_path "path to your config" --reprompting "Y/N" --dataset "c4/dolly" --token_target "value for T"

The generated p-values will be in .csv files in the data/pvalues folder.

All the config files used for the experiments in the paper can be found in configs/generated.

Additionally, a configuration file generator is available in generate_configs.ipynb.

Contact

Thibaud Gloaguen, [email protected]
Nikola Jovanović, [email protected]
Robin Staab, [email protected]
Martin Vechev

Citation

If you use our code please cite the following.

@misc{gloaguen2025discoveringspoofingattemptslanguage,
      title={Discovering Spoofing Attempts on Language Model Watermarks}, 
      author={Thibaud Gloaguen and Nikola Jovanović and Robin Staab and Martin Vechev},
      year={2025},
      eprint={2410.02693},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2410.02693}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
watermark-stealing		watermark-stealing
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml
generate_configs.ipynb		generate_configs.ipynb
generate_paraphrase_attack.py		generate_paraphrase_attack.py
generate_pvalues.py		generate_pvalues.py
generate_text.py		generate_text.py
get_token_occurence.py		get_token_occurence.py
preprocess_analysis.py		preprocess_analysis.py
reprompting_method.py		reprompting_method.py
reprompting_pipeline.sh		reprompting_pipeline.sh
reprompting_preprocess.py		reprompting_preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discovering Spoofing Attempts on Language Model Watermarks

Getting started

Reproducing the experiments

Contact

Citation

About

Uh oh!

Releases

Packages

Languages

eth-sri/watermark-spoofing-detection

Folders and files

Latest commit

History

Repository files navigation

Discovering Spoofing Attempts on Language Model Watermarks

Getting started

Reproducing the experiments

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages