Smile

Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance. Smile is well documented and please check out the project website for programming guides and more information.

Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

Smile implements the following major machine learning algorithms:

Classification: Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting, Random Forest, Logistic Regression, Neural Networks, RBF Networks, Maximum Entropy Classifier, KNN, Naïve Bayesian, Fisher/Linear/Quadratic/Regularized Discriminant Analysis.
Regression: Support Vector Regression, Gaussian Process, Regression Trees, Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, ElasticNet, Ridge Regression.
Feature Selection: Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, TreeSHAP, Signal Noise ratio, Sum Squares ratio.
Clustering: BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical Clustering, Sequential Information Bottleneck, Self-Organizing Maps, Spectral Clustering, Minimum Entropy Clustering.
Association Rule & Frequent Itemset Mining: FP-growth mining algorithm.
Manifold Learning: IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA.
Multi-Dimensional Scaling: Classical MDS, Isotonic MDS, Sammon Mapping.
Nearest Neighbor Search: BK-Tree, Cover Tree, KD-Tree, SimHash, LSH.
Sequence Learning: Hidden Markov Model, Conditional Random Field.
Natural Language Processing: Sentence Splitter and Tokenizer, Bigram Statistical Test, Phrase Extractor, Keyword Extractor, Stemmer, POS Tagging, Relevance Ranking

You can use the libraries through Maven central repository by adding the following to your project pom.xml file.

    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-core</artifactId>
      <version>2.4.0</version>
    </dependency>

For NLP, use the artifactId smile-nlp.

For Scala API, please use

    libraryDependencies += "com.github.haifengl" %% "smile-scala" % "2.4.0"

For Kotlin API, add the below into the dependencies section of Gradle build script.

    implementation("com.github.haifengl:smile-kotlin:2.4.0")

For Clojure API, add the following dependency to your project or build file:

    [org.clojars.haifengl/smile "2.4.0"]

To enable machine optimized matrix computation, the users should make their machine-optimized libblas3 (CBLAS) and liblapack3 (Fortran) available as shared libraries at runtime. This module employs the highly efficient netlib-java library.

OS X

Apple OS X requires no further setup as it ships with the veclib framework.

Linux

Generically-tuned ATLAS and OpenBLAS are available with most distributions and must be enabled explicitly using the package-manager. For example,

sudo apt-get install libatlas3-base libopenblas-base
sudo update-alternatives --config libblas.so
sudo update-alternatives --config libblas.so.3
sudo update-alternatives --config liblapack.so
sudo update-alternatives --config liblapack.so.3

However, these are only generic pre-tuned builds. If you have [Intel MKL] (https://software.intel.com/en-us/mkl) installed, you could also create symbolic links from libblas.so.3 and liblapack.so.3 to libmkl_rt.so or use Debian's alternatives system.

Windows

The native_system builds expect to find libblas3.dll and liblapack3.dll on the %PATH% (or current working directory). Smile prebuilt package ships Intel MKL, which features highly optimized, threaded, and vectorized math functions that maximize performance on each processor family.

Shell

Smile comes with interactive shells for Java, Scala and Kotlin. Download pre-packaged Smile from the releases page. In the home directory of Smile, type

    ./bin/smile

to enter the Scala shell. You can run any valid Scala expressions in the shell. In the simplest case, you can use it as a calculator. Besides, all high-level Smile operators are predefined in the shell. By default, the shell uses up to 75% memory. If you need more memory to handle large data, use the option -J-Xmx or -XX:MaxRAMPercentage. For example,

    ./bin/smile -J-Xmx30G

You can also modify the configuration file ./conf/smile.ini for the memory and other JVM settings.

To use Java's JShell, type

    ./bin/jshell.sh

which has Smile's jars in the classpath. Similarly, run

    ./bin/kotlin.sh

to enter Kotlin REPL.

Model Serialization

Most models support the Java Serializable interface (all classifiers do support Serializable interface) so that you can use them in Spark. For reading/writing the models in non-Java code, we suggest [XStream] (https://github.com/x-stream/xstream) to serialize the trained models. XStream is a simple library to serialize objects to XML and back again. XStream is easy to use and doesn't require mappings (actually requires no modifications to objects). Protostuff is a nice alternative that supports forward-backward compatibility (schema evolution) and validation. Beyond XML, Protostuff supports many other formats such as JSON, YAML, protobuf, etc.

Visualization

Smile provides a Swing-based data visualization library SmilePlot, which provides scatter plot, line plot, staircase plot, bar plot, box plot, histogram, 3D histogram, dendrogram, heatmap, hexmap, QQ plot, contour plot, surface, and wireframe.

To use SmilePlot, add the following to dependencies

    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-plot</artifactId>
      <version>2.4.0</version>
    </dependency>

Smile also support data visualization in declarative approach. With smile.plot.vega package, we can create a specification that describes visualizations as mappings from data to properties of graphical marks (e.g., points or bars). The specification is based on Vega-Lite. The Vega-Lite compiler automatically produces visualization components including axes, legends, and scales. It then determines properties of these components based on a set of carefully designed rules.

Gallery

Kernel PCA	IsoMap
Multi-Dimensional Scaling	SOM
Neural Network	SVM
Agglomerative Clustering	X-Means
DBSCAN	Neural Gas
Wavelet	Exponential Family Mixture

Name		Name	Last commit message	Last commit date
Latest commit History 2,164 Commits
.github		.github
benchmark		benchmark
binder		binder
clojure		clojure
core		core
data		data
demo		demo
doc		doc
graph		graph
interpolation		interpolation
io		io
json		json
kotlin		kotlin
math		math
nd4j		nd4j
netlib		netlib
nlp		nlp
plot		plot
project		project
scala		scala
shell		shell
spark		spark
web		web
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
COPYING.LESSER		COPYING.LESSER
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
package.sh		package.sh
smile.sh		smile.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Smile

OS X

Linux

Windows

Shell

Model Serialization

Visualization

Gallery

Kernel PCA

IsoMap

Multi-Dimensional Scaling

SOM

Neural Network

SVM

Agglomerative Clustering

X-Means

DBSCAN

Neural Gas

Wavelet

Exponential Family Mixture

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

tomassvensson/smile

Folders and files

Latest commit

History

Repository files navigation

Smile

OS X

Linux

Windows

Shell

Model Serialization

Visualization

Gallery

Kernel PCA

IsoMap

Multi-Dimensional Scaling

SOM

Neural Network

SVM

Agglomerative Clustering

X-Means

DBSCAN

Neural Gas

Wavelet

Exponential Family Mixture

About

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages