Auditing language models with distribution-based sensitivity analysis

This is the official implementation of DBSA (Auditing language models with distribution-based sensitivity analysis). We are interested in the question---can we understand how the outputs of black-box LLMs depend on any perturbation?

Overview

This repository includes the necessary code, prompts, and collected responses to reproduce the experiments and results presented in the AISTATS paper.

Installation

pip install -e .

Also check requirements.txt.

Prerequisites

To use this repository, you will need an API to access the LLM. The simplest way to set this up is to create a Python file named src/utils/openai_config.py with the following structure:

def get_openai_config():
    openai_config = {
        api_key = api_key,
        api_version = api_version,
        api_endpoint = api_endpoint,
        model_deployment_id = model_deployment_id
    }
    return openai_config

def get_embedding_config():
    ada_config = {
        api_key = api_key,
        api_version = api_version,
        api_endpoint = api_endpoint,
        embedding_model_deployment_id = embedding_model_deployment_id
    }
    return ada_config

Replace the placeholders api_key, api_version, api_endpoint, model_deployment_id, embedding_model_deployment_id with your actual configuration values.

Repository Structure

The core component of the repository is the src folder. Under src, there are three subcategories:

data contains the necessary code to generate data for the experiments. For the purpose of this paper, we generate synthetic sentences, and focus on perturbing the immediate neighbors to each word in the sentence.
model contains the core code to calculate distance between the original response and the perturbed response. For this paper, we provide two methods to approximate the distance---JSD and energy distance.
utils contains the code to setup and query LLM + embedding models.

Finally, exp contains all the sampled LLM responses for the experiments in the paper. All the raw LLM responses will fall under the folder responses, and the processed responses will fall under the folder scores, which calculates the distance between the original response and the perturbed response. In order to generate the responses from scratch, you should do run.py. Alternatively, you can only use the plotting bit of run.py and generate the plots in the paper directly from the calculated scores.

Running the experiments

Go through the prerequisities, and setup your API config. Crucially, you should make sure you can run src/utils/openai_config.py and src/utils/setup_llm.py before moving on to the next step.
If you wish to run the entire experiment---including LLM generation---you should go to the corresponding experiment folder, and do run.py.
(To save some time) you can use the plotting bit in run.py and directly generate the plots in the paper from the calculated scores.

Cite

@inproceedings{
rauba2025auditing,
title={Auditing language models with distribution-based sensitivity analysis},
author={Paulius Rauba and Qiyao Wei and Mihaela van der Schaar},
booktitle={The 28th International Conference on Artificial Intelligence and Statistics},
year={2025},
url={https://openreview.net/forum?id=ilNQ2m4GTy}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
exps/AISTATS		exps/AISTATS
src/dbsa		src/dbsa
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS.rst		AUTHORS.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Auditing language models with distribution-based sensitivity analysis

Overview

Installation

Prerequisites

Repository Structure

Running the experiments

Cite

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

vanderschaarlab/visualizing-token-importance

Folders and files

Latest commit

History

Repository files navigation

Auditing language models with distribution-based sensitivity analysis

Overview

Installation

Prerequisites

Repository Structure

Running the experiments

Cite

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages