Skip to content

machine learning tutorial #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions book/basics/machine_learning.nim
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import nimib, nimibook

nbInit(theme=useNimibook)

nbText: md"""
# 🤖👩‍🎓 Machine Learning

In this tutorial we will cover the basics of [Machine Learning]
in Nim by showing examples of the three classical tasks of
[Clustering], [Classification], and [Regression].

As reference dataset we will use [penguins dataset].
An exploration of this dataset in Nim is available as an [example notebook] of nimib.
Using this dataset we will:
- cluster penguins' features and see how much this matches species using [k-means clustering]
- classify penguins' sex and species starting from various sets of features using [logistic regression]
- predict the weight of penguins based on their bill size using [linear regression]

Machine learning main concern is to build algorithms for automated data-driven prediction.
The measure of success of such a modelling activity is usually encapsulated in some performance metric
computed on specific subsets of data called [training, validation and test].
In our examples we will:
- split the dataset in training and test
- process features appropriately
- fit the model on training set
- predict the test set using the trained model
- compute various metrics on predictions, appropriate for each of the above tasks
- validate our modelling approaches through [cross-validation]

We will be using [arraymancer] for the implementation of machine learning algorithms,
[datamancer] for manipulating data, [ggplotnim] for visualization.
In Nim there is not (yet) a specific library that encapsulate
machine learning concepts such as the `Estimator` of [scikit-learn].
In this tutorial we will also try to build a simple api
that could be a seed for a future [scinim/learn] library.

[Machine Learning]: https://en.wikipedia.org/wiki/Machine_learning
[Clustering]: https://en.wikipedia.org/wiki/Cluster_analysis
[Classification]: https://en.wikipedia.org/wiki/Statistical_classification
[Regression]: https://en.wikipedia.org/wiki/Regression_analysis
[penguins dataset]: https://allisonhorst.github.io/palmerpenguins/
[example notebook]: https://pietroppeter.github.io/nimib/penguins.html
[logistic regression]: https://en.wikipedia.org/wiki/Logistic_regression
[k-means clustering]: https://en.wikipedia.org/wiki/K-means_clustering
[linear regression]: https://en.wikipedia.org/wiki/Linear_regression
[training, validation and test]: https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets
[cross-validation]: https://en.wikipedia.org/wiki/Cross-validation_(statistics)
[arraymancer]: https://github.com/mratsim/Arraymancer
[datamancer]: https://github.com/scinim/datamancer
[ggplotnim]: https://github.com/vindaar/ggplotnim
[scikit-learn]: https://scikit-learn.org/stable/
[scinim/learn]: https://github.com/scinim/learn
"""

nbSave
1 change: 1 addition & 0 deletions getting_started.nim
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ var book = newBookFromToc("SciNim Getting Started", "book"):
entry("Common datatypes", "common_datatypes")
entry("Data wrangling with dataframes", "data_wrangling")
entry("Plotting", "basic_plotting")
entry("Machine Learning", "machine_learning")
entry("Units", "units_basics")
section("Numerical methods", "numerical_methods/index"):
entry("Curve fitting", "curve_fitting")
Expand Down