GitHub - ur-whitelab/chem-dual-use

DUAL USE IN CHEM: exploring ways to censor chemical data to mitigate dual use risks.

As we explore strageties to mitigate dual use risks in predictive chemistry (DURPC), we present our data-level mitigation strategy: Selective Noise Addition. In pursuit of public distribution of chemical data in safe ways, we test adding noise to only selected data in the dataset with labels identified as sensitive. We test this method with three models:

1-D Polynomial Regression
Multilayer Perceptron (MLP)
Graph Convolutional Network (GCN) predicting lipophilicity

Read the paper

Setup

conda env create -f environment.yml
conda activate dualusage
python -m ipykernel install --user --name dualusage  # may not be needed

Usage

1-D Polynomial Regression

Open and run SimplePolynomial.ipynb in the root directory.

MLP — quick comparison across censoring types

Open and run mlp_task/mlp_quick_comparison_run.ipynb.

MLP — full sweep

cd mlp_task
python papermill_run.py

This tests all censoring types across intensity levels. Executed notebooks are written to mlp_task/OUTPUTS/notebooks/ and the results figure notebook to mlp_task/OUTPUTS/mlp_main_plots.ipynb.

GCN — full sweep

cd gcn_task
python papermill_run.py

This tests all censoring types across intensity levels. Executed notebooks are written to gcn_task/OUTPUTS/notebooks/ and postprocessing notebooks to gcn_task/OUTPUTS/postprocess_stuff/.

@article{campbell2023censoring,
      title={Censoring chemical data to mitigate dual use risk}, 
      author={Quintina L. Campbell and Jonathan Herington and Andrew D. White},
      year={2023},
      eprint={2304.10510},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DUAL USE IN CHEM: exploring ways to censor chemical data to mitigate dual use risks.

Setup

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
gcn_task		gcn_task
mlp_task		mlp_task
README.md		README.md
SimplePolynomial.ipynb		SimplePolynomial.ipynb
environment.yml		environment.yml

Folders and files

Latest commit

History

Repository files navigation

DUAL USE IN CHEM: exploring ways to censor chemical data to mitigate dual use risks.

Setup

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages