This tutorial contains two jupyter notebooks in the notebooks directory:
Both of them demonstrate a simple application of the random forest algorithm to relevant problems in astronomy, using the scikit-learn machine learning package for python. The classification notebook describes all steps in more detail and contains links to appropriate scikit-learn manuals.
The DATA directory contains a single text file with 6 entries
for each of the ~200 000 central galaxies observed by
the Sloan Digital Sky Survey (SDSS). The structure of the file is explained in detail in the classification notebook.
The compiled data uses the following publically available catalogues:
- star formation rate (SFR) and stellar mass (M*) estimates from the MPA-JHU release of spectrum measurements
- dark matter halo mass estimates from the Yang Catalogue
- stellar velocity dispersions (σdisp) from the NYU Value-Added Galaxy Catalogue
- the supermassive black hole masses (MBH) are estimated using the MBH-σdisp relation in Saglia et al. 2016
Finally, the scripts directory contains a single script
support_tools, where my favourite configure_plots function
is defined.
I hope you enjoy the content of these two notebooks and, hopefully, find them useful. If you have any questions, comments or requests, please feel free to reach out to me directly!
I would like to thank my supervisor Asa Bluck for introducing me to the joy of machine learning and inspiring this tutorial.