chinese_restaurant_process is a Python package that provides an easy to use interface for simulating the Chinese Restaurant Process (CRP), a popular model in Bayesian nonparametrics.
You can install Chinese Restaurant Process using pip:
pip install chinese_restaurant_process
You can install the latest version directly from the GitHub repository using pip:
pip install git+https://github.com/jhaberbe/chinese_restaurant_process
Or, if you want to install the package from source, you can use:
git clone https://github.com/jhaberbe/chinese_restaurant_process.git
pip install .
To perform the initial inference of classes:
import numpy as np
from crp.process import ChineseRestaurantProcess
# Your data, (n_samples, n_features)
X = np.random.randint(1, 100, size=(1000, 10))
# Run inference on train data.
crp = ChineseRestaurantProcess(X, expected_number_of_classes=1)
crp.run(epochs=1)
After training, you can predict the class of new data points:
# Your data, (n_samples, n_features)
X_new = np.random.randint(1, 100, size=(1000, 10))
# Setting min_membership = 0.01 is recommended usually.
# Since this is random data, we set it to 0
labels = crp.predict(X_new, min_membership=0.0)
We have a few tutorials in the notebook/
folder. They go over basic usage, and also try to explain how inference is going to be performed.
Recommended reading order is:
- Usage.ipynb
- What-The-Heck-Is-Collapsed-Gibbs-Sampling.ipynb
- Explaining-Class-Structure.ipynb
We welcome contributions. If you'd like to contribute, please follow these steps:
- Fork the repository on GitHub.
- Create a new branch for your changes.
- Make the changes to your branch.
- Commit your changes with a meaningful commit message.
- Create a pull request against the main branch.
- Improve sampling of inital hyperparameters, right now the hyperparameters are sampled are fixed, and it works well for most cases, but it would be nice to have a more robust sampling method.
- Infinitely Nested Chinese Restaurant Process, as described by Blei et al. (2010). Some work is already done, but it is not yet ready for use (
notebook/In Progress/Nested-Chinese-Restaurant-Process.ipynb
). - [1/2] More distributions (Gaussian, Gamma, etc. etc.)
- Plotting utilities to visualize the class structure of the CRP.
- Utilities to extract out which features are most important for each class.
- Testing (I've never done testing before, so this will be a learning experience for me).
Chinese Restaurant Process is released under the GPL v3 license. See the LICENSE file for more information.