Skip to content

Latest commit

 

History

History
120 lines (104 loc) · 5.89 KB

ml-and-stats-overview.md

File metadata and controls

120 lines (104 loc) · 5.89 KB
layout title permalink
page
Machine Learning and Statistics
/ml-and-stats-overview/

Being a somewhat structured person, it's important for me to see how topics relate to each other. With machine learning and statistics being such expansive topics, I struggled (and still do) to understand how all the different methods related to each other. This page is my attempt to structure the various topics/methods of machine learning and statistics. My ultimate goal will be to have a post on each of these topics.

Warning!

This page is very much a work-in-progress

Probability and Statistics

  • There are 2 major branches of statistics:
    1. Description statistics: Summarizing and presenting key characteristics about some data.
    2. Inferential statistics: Inferring characteristics about a population from a small sample of it. In this branch, you typically start with a hypothesis and then test if your sample follows the hypothesis.
  • Probability and statistics deal with questions involving populations and samples, but do so in an "inverse manner" to one another.
    • In a probability problem, properties of a population under study are known (e.g. specified distribution of a population), and questions regarding a sample taken from the population are posed and answered.
    • In a statistics problem, characteristics of the sample are known and properties of the population are inferred.
  • We study probability first before statistics because we need to understand the uncertainity associated with taking a sample from a population. Then we are start to understand what a particular sample can tell us about a population.
  1. Descriptional Statistics
    • Correlation analysis
      • Parametric
      • Non-Parametric (Kendall and Spearman)
  2. [Random Variables]({% post_url 2016-02-26-random-variables %})
  3. Probabilities
    • [Joint, Marginal, and Conditional Probabilites]({% post_url 2016-03-20-basic-prob %})
    • [Bayes' Rule]({% post_url 2016-04-21-bayes-rule %})
  4. [Probability Distributions]({% post_url 2016-03-17-prob-distr %})
    • Continuous
      • Gaussian (Normal) Distribution
      • Dirichlet Distribution
      • Exponential Distribution
      • Chi-Square Distribution
      • Weibull Distribution
        • Exponential Distribution
      • Beta Distribution
    • Discrete
      • Bernoulli Distribution
      • Binomial Distribution (Sum of n independent Bernoulli trails)
      • Multinomial Distribution
      • Multivariate Hypergeometric Distribution
        • If sampling is done without replacement.
      • Poission Distribution
      • Negative Binomial Distribution
      • Beta-binomial distribution
  5. Hypothesis Testing
  6. Other
    • [Confidence Intervals]({% post_url 2015-08-25-how-to-interpret-a-CI %})
    • Power Analysis
  7. Survival Analysis
    • [The Basics of Survival Analysis]({% post_url 2016-05-12-survival-analysis %})
    • Kaplan-Meier Curves and the Log-rank Test
    • Cox Regression
    • Survival Analysis Study Design Considerations

References

Bayesian Statistics

  1. [Bayesian Inference 101]({% post_url 2017-03-08-how-to-bayesian-infer-101 %})
  2. Classical Frequentist vs. Bayesian
  3. Markov Chain Monte Carlo (MCMC) processes
    • Metropolis algorithm
    • Gibbs Sampling (Special case of Metropolis)
    • BUGS, JAGS, STAN

References

Machine Learning / Statistical Learning

![Machine Learning Algorithm Cheat Sheet]({{ site.url }}/assets/microsoft-machine-learning-algorithm-cheat-sheet-v2.png)

  1. Supervised Learning
    • Regression (predict continuous values)
      • Linear Regression
      • Artificial Neural Networks (ANN)
        • Can also be used for classification.
    • Classification - Predict discrete (categorical) values (i.e. class a data point belongs to)
      • Logistic Regression
      • Linear Discriminant Analysis (LDA)
        • Naive Bayes Classifier
      • Support Vector Machines (SVM)
      • Random Forest
      • ANN
        • Can also be used for regression.
  2. Unsupervised Learning
    • Cluster Analysis
      • Principle Component Analysis (PCA)
      • t-SNE
      • Hierarchical clustering
      • DBSCAN
      • K-means
      • [Mixture Models]({% post_url 2015-10-13-mixture-model %})
      • Topic Modeling
        • Latent Dirichlet Allocation (LDA)
        • Non-negative Matrix Factorization (NMF)
  3. Feature Selection
  4. Other

References