For course notes, handouts, and examples see the Class-Material folder. For homework, paper, and project information see the Assingments folder. For general course information see the syllabus.
This course introduces the core concepts and practical applications of machine learning (ML) for public health, biostatistics, and related disciplines (e.g., epidemiology, psychology, neuroscience, genetics). Working in R, you will learn how to prepare data, select and implement appropriate ML methods, and evaluate model performance. The emphasis is on conceptual understanding, hands-on implementation, and interpretation, not on mathematical derivations. By the end of the course, you will be ready to incorporate ML techniques into your own research, with particular attention to the unique challenges and ethical considerations of health data.
The main topics we will cover in this course (among others) are:
-
Penalized Regression (Shrinkage Methods)
-
Tree-Based Methods: CART and Ensemble Techniques
-
Support Vector Machines
-
Neural Networks
-
Basics of Large Language Models
-
Dimension Reduction
-
Clustering