Skip to content

Commit e70a29b

Browse files
authored
Added precommit hooks (#63)
* Implemented new pre-commit hooks and removed the paper directory The paper directory will be added in another PR after cleaning up * Added tests to the dev branch * Skipping tests until get the correct secrets to the repo
1 parent 17ad815 commit e70a29b

66 files changed

Lines changed: 564 additions & 822046 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build.yml

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
name: publish
32

43
on:
@@ -9,27 +8,26 @@ on:
98

109
jobs:
1110
publish:
12-
1311
runs-on: ubuntu-latest
1412

1513
steps:
16-
- uses: actions/checkout@v2
17-
- name: Set up Python "3.10"
18-
uses: actions/setup-python@v2
19-
with:
20-
python-version: "3.10"
21-
- name: Install dependencies
22-
run: |
23-
python -m pip install --upgrade pip
24-
pip install flake8 pytest build
25-
if [ -f dev-requirements.txt ]; then pip install -r dev-requirements.txt; fi
26-
- name: Install
27-
run: |
28-
pip install .[gpr]
29-
- name: Build a binary wheel and a source tarball
30-
run: |
31-
python -m build --sdist --wheel --outdir dist/ .
32-
- name: Publish distribution 📦 to PyPI
33-
uses: pypa/gh-action-pypi-publish@master
34-
with:
35-
password: ${{ secrets.PYPI_API_TOKEN }}
14+
- uses: actions/checkout@v2
15+
- name: Set up Python "3.10"
16+
uses: actions/setup-python@v2
17+
with:
18+
python-version: "3.10"
19+
- name: Install dependencies
20+
run: |
21+
python -m pip install --upgrade pip
22+
pip install flake8 pytest build
23+
if [ -f dev-requirements.txt ]; then pip install -r dev-requirements.txt; fi
24+
- name: Install
25+
run: |
26+
pip install .[gpr]
27+
- name: Build a binary wheel and a source tarball
28+
run: |
29+
python -m build --sdist --wheel --outdir dist/ .
30+
- name: Publish distribution 📦 to PyPI
31+
uses: pypa/gh-action-pypi-publish@master
32+
with:
33+
password: ${{ secrets.PYPI_API_TOKEN }}

.github/workflows/tests.yml

Lines changed: 52 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,52 @@
1-
2-
name: test
3-
4-
on:
5-
push:
6-
branches: [ main ]
7-
pull_request:
8-
branches: [ main ]
9-
10-
jobs:
11-
test:
12-
13-
runs-on: ubuntu-latest
14-
strategy:
15-
matrix:
16-
python-version: [3.9, "3.10", 3.11]
17-
18-
steps:
19-
- uses: actions/checkout@v2
20-
- name: Set up Python ${{ matrix.python-version }}
21-
uses: actions/setup-python@v2
22-
with:
23-
python-version: ${{ matrix.python-version }}
24-
- name: Install dependencies
25-
run: |
26-
python -m pip install --upgrade pip
27-
pip install pytest build
28-
if [ -f dev-requirements.txt ]; then pip install -r dev-requirements.txt; fi
29-
- name: Install
30-
run: |
31-
pip install .[gpr]
32-
- name: Run Test
33-
env:
34-
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
35-
run: |
36-
pytest tests
1+
name: test
2+
3+
on:
4+
push:
5+
branches: [main, dev]
6+
pull_request:
7+
branches: [main, dev]
8+
9+
jobs:
10+
pre-commit:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v2
14+
- name: Set up Python
15+
uses: actions/setup-python@v2
16+
with:
17+
python-version: "3.10"
18+
- name: Install pre-commit
19+
run: |
20+
python -m pip install --upgrade pip
21+
pip install pre-commit
22+
- name: Run pre-commit
23+
run: |
24+
pre-commit run --all-files
25+
26+
test:
27+
# Skip this job for now - will be re-enabled later
28+
if: false
29+
runs-on: ubuntu-latest
30+
strategy:
31+
matrix:
32+
python-version: [3.11]
33+
34+
steps:
35+
- uses: actions/checkout@v2
36+
- name: Set up Python ${{ matrix.python-version }}
37+
uses: actions/setup-python@v2
38+
with:
39+
python-version: ${{ matrix.python-version }}
40+
- name: Install dependencies
41+
run: |
42+
python -m pip install --upgrade pip
43+
pip install pytest build
44+
if [ -f dev-requirements.txt ]; then pip install -r dev-requirements.txt; fi
45+
- name: Install
46+
run: |
47+
pip install .[gpr]
48+
- name: Run Test
49+
env:
50+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
51+
run: |
52+
pytest tests

.pre-commit-config.yaml

Lines changed: 39 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,39 @@
1-
repos:
2-
- repo: https://github.com/pre-commit/pre-commit-hooks
3-
rev: v2.2.3
4-
hooks:
5-
- id: trailing-whitespace
6-
- id: check-yaml
7-
- id: end-of-file-fixer
8-
- id: mixed-line-ending
9-
- repo: https://github.com/psf/black
10-
rev: "22.3.0"
11-
hooks:
12-
- id: black
13-
- repo: https://github.com/tomcatling/black-nb
14-
rev: "0.7"
15-
hooks:
16-
- id: black-nb
17-
description: strip output and black source
18-
additional_dependencies: ['black[jupyter]']
19-
args: ["--clear-output"]
1+
default_language_version:
2+
python: python3
3+
repos:
4+
- repo: https://github.com/pre-commit/pre-commit-hooks
5+
rev: v5.0.0
6+
hooks:
7+
- id: check-added-large-files
8+
- id: fix-byte-order-marker
9+
- id: check-case-conflict
10+
- id: check-merge-conflict
11+
- id: check-shebang-scripts-are-executable
12+
- id: check-symlinks
13+
- id: check-toml
14+
- id: check-yaml
15+
- id: debug-statements
16+
- id: detect-private-key
17+
- id: end-of-file-fixer
18+
- id: mixed-line-ending
19+
- id: trailing-whitespace
20+
- repo: https://github.com/psf/black
21+
rev: "22.3.0"
22+
hooks:
23+
- id: black
24+
- repo: https://github.com/srstevenson/nb-clean
25+
rev: 4.0.1
26+
hooks:
27+
- id: nb-clean
28+
args: [--preserve-cell-outputs, --remove-empty-cells]
29+
- repo: https://github.com/rbubley/mirrors-prettier
30+
rev: v3.4.2
31+
hooks:
32+
- id: prettier
33+
- repo: https://github.com/codespell-project/codespell
34+
rev: v2.3.0
35+
hooks:
36+
- id: codespell
37+
additional_dependencies: [".[toml]"]
38+
exclude_types: [jupyter]
39+
args: ["-L formate,Nd"]

README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
# 🤖 BO-LIFT: Bayesian Optimization using in-context learning
22

3-
43
![version](https://img.shields.io/badge/version-0.0.1-brightgreen)
54
[![paper](https://img.shields.io/badge/paper-arXiv-red)](https://arxiv.org/abs/2304.05341)
65
[![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)
76

8-
97
BO-LIFT does regression with uncertainties using frozen Large Language Models by using token probabilities.
108
It uses LangChain to select examples to create in-context learning prompts from training data.
119
By selecting examples, it can consider more training data than it fits in the model's context window.
1210
Being able to predict uncertainty, allow the employment of interesting techniques such as Bayesian Optimization.
1311

1412
## Table of content
13+
1514
- [BO-LIFT](#-bo-lift-bayesian-optimization-using-in-context-learning)
1615
- [Install](#install-)
1716
- [Usage](#usage-)
@@ -48,6 +47,7 @@ os.environ["OPENAI_API_KEY"] = "<your-key-here>"
4847
### Quickstart 🔥
4948

5049
`bolift` provides a simple interface to use the model.
50+
5151
```py
5252
# Create the model object
5353
asktell = bolift.AskTellFewShotTopk()
@@ -62,9 +62,11 @@ asktell.tell("1-bromonaphthalene", -4.35)
6262
yhat = asktell.predict("1-bromobutane")
6363
print(yhat.mean(), yhat.std())
6464
```
65+
6566
This prediction returns $-2.92 \pm 1.27$.
6667

6768
Further improvements can be done by using Bayesian Optimization.
69+
6870
```py
6971
# Create a list of examples
7072
pool_list = [
@@ -84,9 +86,11 @@ asktell.ask(pool)
8486
(['1-bromo-2-methylpropane'], [-1.284916344093158], [-1.92])
8587

8688
```
89+
8790
Where the first value is the selected point, the second value is the value of the acquisition function, and the third value is the predicted mean.
8891

8992
Let's tell this point to the model with its correct label and make a prediction:
93+
9094
```py
9195
asktell.tell("1-bromo-2-methylpropane", -2.430)
9296

@@ -113,8 +117,10 @@ asktell = bolift.AskTellFewShotTopk(
113117
temperature=0.7,
114118
)
115119
```
120+
116121
Other arguments can be used to customize the prompt (`prefix`, `prompt_template`, `suffix`) and the in-context learning procedure (`use_quantiles`, `n_quantiles`).
117122
Additionally, we implemented other models. A brief list can be seen below:
123+
118124
- AskTellFewShotMulti;
119125
- AskTellFewShotTopk;
120126
- AskTellFinetuning;
@@ -149,18 +155,21 @@ for i in range(n):
149155

150156
asktell.inv_predict(20.0)
151157
```
158+
152159
The data for that is available in the paper directory.
153160
This generated the following procedure:
161+
154162
```
155163
the synthesis procedure:"A 30 wt% tungsten carbide catalyst was prepared with Cu dopant metal at 5 wt% and carburized at 835 C. The reaction was run at 350 ºC"
156164
```
157165

158166
### Citation
159167

160168
Please, cite [Ramos et al.](https://arxiv.org/abs/2304.05341):
169+
161170
```
162171
@misc{ramos2023bayesian,
163-
title={Bayesian Optimization of Catalysts With In-context Learning},
172+
title={Bayesian Optimization of Catalysts With In-context Learning},
164173
author={Mayk Caldas Ramos and Shane S. Michtavy and Marc D. Porosoff and Andrew D. White},
165174
year={2023},
166175
eprint={2304.05341},

bolift/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,5 @@
1919
"AskTellRidgeKernelRegression",
2020
"AskTellNearestNeighbor",
2121
"Pool",
22-
"BOLiftTool"
23-
]
22+
"BOLiftTool",
23+
]

bolift/aqfxns.py

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,15 @@ def expected_improvement(dist, best):
1010
elif isinstance(dist, GaussDist):
1111
return expected_improvement_g(dist.mean(), dist.std(), best)
1212

13+
1314
def log_expected_improvement(dist, best):
1415
"""Log Expected improvement for the given discrete distribution"""
1516
if isinstance(dist, DiscreteDist):
1617
return log_expected_improvement_d(dist.probs, dist.values, best)
1718
elif isinstance(dist, GaussDist):
1819
return log_expected_improvement_g(dist.mean(), dist.std(), best)
19-
20+
21+
2022
# I think it's just taking the log of the final EI computation. Will test this later
2123
# def log_expected_improvement(dist, best):
2224
# """Log Expected improvement for the given discrete distribution"""
@@ -25,6 +27,7 @@ def log_expected_improvement(dist, best):
2527
# elif isinstance(dist, GaussDist):
2628
# return np.log(expected_improvement_g(dist.mean(), dist.std(), best))
2729

30+
2831
def probability_of_improvement(dist, best):
2932
"""Probability of improvement for the given discrete distribution"""
3033
if isinstance(dist, DiscreteDist):
@@ -54,12 +57,14 @@ def expected_improvement_d(probs, values, best):
5457
ei = np.sum(np.maximum(values - best, 0) * probs)
5558
return ei
5659

60+
5761
def log_expected_improvement_d(probs, values, best):
5862
"""Log Expected improvement for the given discrete distribution"""
5963
# ei = np.sum(np.maximum(values - best, 0) * probs)
60-
log_ei = np.log(np.sum(np.maximum(values - best, 0) * probs)+1e-15)
64+
log_ei = np.log(np.sum(np.maximum(values - best, 0) * probs) + 1e-15)
6165
return log_ei
6266

67+
6368
def probability_of_improvement_d(probs, values, best):
6469
"""Probability of improvement for the given discrete distribution"""
6570
pi = np.sum(np.cast[float](values > best) * probs)
@@ -80,32 +85,34 @@ def greedy_d(probs, values, best):
8085

8186
def expected_improvement_g(mean, std, best):
8287
"""Expected improvement for the given Gaussian distribution"""
83-
eps=1e-15
84-
z = (mean - best) / (std+eps)
88+
eps = 1e-15
89+
z = (mean - best) / (std + eps)
8590
ei = (mean - best) * norm.cdf(z) + std * norm.pdf(z)
8691
return ei
8792

93+
8894
def log_expected_improvement_g(mean, std, best):
8995
"""Log Expected improvement for the given Gaussian distribution"""
90-
eps=1e-15
91-
z = (mean - best) / (std+eps)
96+
eps = 1e-15
97+
z = (mean - best) / (std + eps)
9298
# ei = std * h(z)
9399
# ei = std * (norm.pdf(z) + z * norm.cdf(z))
94100
log_ei = np.log(std) + np.log((norm.pdf(z) + z * norm.cdf(z)))
95101
return log_ei
96102

103+
97104
def probability_of_improvement_g(mean, std, best):
98105
"""Probability of improvement for the given Gaussian distribution"""
99-
eps=1e-15
100-
z = (mean - best) / (std+eps)
106+
eps = 1e-15
107+
z = (mean - best) / (std + eps)
101108
pi = norm.cdf(z)
102109
return pi
103110

104111

105112
def upper_confidence_bound_g(mean, std, best, _lambda):
106113
"""Upper confidence bound for the given Gaussian distribution"""
107-
eps=1e-15
108-
return mean + _lambda * (std+eps)
114+
eps = 1e-15
115+
return mean + _lambda * (std + eps)
109116

110117

111118
def greedy_g(mean, std, best):

0 commit comments

Comments
 (0)