Skip to content

selvinsource/hazelcast-jet-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hazelcast JET ML

Machine learning algorithms using the distributed computing platform Hazelcast JET.

Use JetMLDemo as example of usage of the Jet ML Pipeline.

Installation

git clone https://github.com/selvinsource/hazelcast-jet-ml.git
cd hazelcast-jet-ml
mvn clean compile test assembly:single

Documentation

The Jet ML Pipeline allows to chain Estimators and Transformers.

  • The Estimator is an algorithm that returns a Transformer given a dataset to fit
  • The Transformer is an ML model that transforms one dataset into another
  • A dataset is represented by n Hazelcast IListJet (which is not distributed, in a future version this will be converted to a distributed IMapJet)

Inspired by scikit-learn, see paper.

Datasets

The following datasets have been used:

K-Means Clustering Examples

Train a model and show identified clusters

// Create two Jet members
JetInstance instance1 = Jet.newJetInstance();
Jet.newJetInstance();

// Get a training dataset (it is assumed this is already populated, e.g. from a file)
IListJet<double[]> trainDataset = instance1.getList("trainDataset");

// Train a model using the train dataset, k = 3, maxIter = 20
// k = 3 the number of desired clusters
// maxIter = 20 maximum number of iteration if not converging
KMeans kMeans = new KMeans(3, 20);
KMeansModel model = kMeans.fit(trainDataset);

// Show the identified centroids
LOGGER.info("Centroids:");
model.getCentroids().stream().forEach(c -> LOGGER.info(Arrays.toString(c)));

Jet.shutdownAll();

Train a model and predict test data using Jet ML Pipeline

// Create two Jet members
JetInstance instance1 = Jet.newJetInstance();
Jet.newJetInstance();
 
// Get datasets to train the model and then test it
IListJet<double[]> trainDataset = instance1.getList("trainDataset");
IListJet<double[]> testDataset = instance1.getList("testDataset");

// Create a KMeans estimator
Estimator<double[]> estimator = new KMeans(3, 20);

// Hazelcast Get ML Pipeline: given a train dataset the estimator (KMeans) returns a transformer (KMeanModel) which assigns clusters to test dataset instances
IListJet<double[]> outputDataset = estimator.fit(trainDataset).transform(testDataset);

Jet.shutdownAll();

K-Means Clustering Demo

java -jar target/hazelcast-jet-ml-0.6.1-jar-with-dependencies.jar KMeans

See demo full code.

About

Hazelcast Jet machine learning algorithms

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages