Skip to content

Commit a033aa1

Browse files
committed
docs: add November 2020 data science posts
1 parent 434c25a commit a033aa1

5 files changed

+196
-0
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
author_profile: false
3+
categories:
4+
- Statistics
5+
classes: wide
6+
date: '2020-11-05'
7+
excerpt: An introduction to probability theory concepts every data scientist should know.
8+
header:
9+
image: /assets/images/data_science_10.jpg
10+
og_image: /assets/images/data_science_10.jpg
11+
overlay_image: /assets/images/data_science_10.jpg
12+
show_overlay_excerpt: false
13+
teaser: /assets/images/data_science_10.jpg
14+
twitter_image: /assets/images/data_science_10.jpg
15+
keywords:
16+
- Probability theory
17+
- Random variables
18+
- Distributions
19+
- Data science
20+
seo_description: Learn the core principles of probability theory, from random variables to common distributions, with practical examples for data science.
21+
seo_title: 'Probability Theory Basics for Data Science'
22+
seo_type: article
23+
summary: This post reviews essential probability concepts like random variables, expectation, and common distributions, illustrating how they underpin data science workflows.
24+
tags:
25+
- Probability
26+
- Statistics
27+
- Data science
28+
title: 'Probability Theory Basics for Data Science'
29+
---
30+
31+
Probability theory provides the mathematical foundation for modeling uncertainty. By understanding random variables and probability distributions, data scientists can quantify risks and make informed decisions.
32+
33+
## Random Variables and Distributions
34+
35+
A random variable assigns numerical values to outcomes in a sample space. Key distributions such as the binomial, normal, and Poisson describe how probabilities are spread across possible outcomes. Knowing these distributions helps in selecting appropriate models and estimating parameters.
36+
37+
## Expectation and Variance
38+
39+
Two fundamental measures of a random variable are its **expected value** and **variance**. The expected value represents the long-run average, while the variance measures how spread out the outcomes are. These metrics are critical for evaluating models and comparing predictions.
40+
41+
Mastering probability theory enables data scientists to better interpret model outputs and reason about uncertainty in real-world applications.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
author_profile: false
3+
categories:
4+
- Data Science
5+
classes: wide
6+
date: '2020-11-10'
7+
excerpt: Understand how simple linear regression models the relationship between two variables using a single predictor.
8+
header:
9+
image: /assets/images/data_science_11.jpg
10+
og_image: /assets/images/data_science_11.jpg
11+
overlay_image: /assets/images/data_science_11.jpg
12+
show_overlay_excerpt: false
13+
teaser: /assets/images/data_science_11.jpg
14+
twitter_image: /assets/images/data_science_11.jpg
15+
keywords:
16+
- Linear regression
17+
- Least squares
18+
- Data analysis
19+
seo_description: Discover the mechanics of simple linear regression and how to interpret slope and intercept when fitting a straight line to data.
20+
seo_title: 'A Primer on Simple Linear Regression'
21+
seo_type: article
22+
summary: This article introduces simple linear regression and the least squares method, showing how a single predictor explains variation in a response variable.
23+
tags:
24+
- Regression
25+
- Statistics
26+
- Data science
27+
title: 'A Primer on Simple Linear Regression'
28+
---
29+
30+
Simple linear regression is a foundational technique for modeling the relationship between a predictor variable and a response variable. By fitting a straight line, we can quantify how changes in one variable are associated with changes in another.
31+
32+
## The Least Squares Method
33+
34+
The most common approach to estimating the regression line is **ordinary least squares (OLS)**. OLS finds the line that minimizes the sum of squared residuals between the observed data points and the line's predictions. The slope indicates the strength and direction of the relationship, while the intercept shows the expected value when the predictor is zero.
35+
36+
Understanding simple linear regression is a stepping stone toward more complex modeling techniques, providing crucial intuition about correlation and causation.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
author_profile: false
3+
categories:
4+
- Statistics
5+
classes: wide
6+
date: '2020-11-20'
7+
excerpt: Explore the fundamentals of Bayesian inference and how prior beliefs combine with data to form posterior conclusions.
8+
header:
9+
image: /assets/images/data_science_12.jpg
10+
og_image: /assets/images/data_science_12.jpg
11+
overlay_image: /assets/images/data_science_12.jpg
12+
show_overlay_excerpt: false
13+
teaser: /assets/images/data_science_12.jpg
14+
twitter_image: /assets/images/data_science_12.jpg
15+
keywords:
16+
- Bayesian statistics
17+
- Priors
18+
- Posterior distributions
19+
- Data science
20+
seo_description: An overview of Bayesian inference, demonstrating how to update prior beliefs with new evidence to make data-driven decisions.
21+
seo_title: 'Bayesian Inference Explained'
22+
seo_type: article
23+
summary: Learn how Bayesian inference updates prior beliefs into posterior distributions, providing a flexible framework for reasoning under uncertainty.
24+
tags:
25+
- Bayesian
26+
- Inference
27+
- Statistics
28+
title: 'Bayesian Inference Explained'
29+
---
30+
31+
Bayesian inference offers a powerful perspective on probability, treating unknown quantities as distributions that update when new evidence appears.
32+
33+
## Priors and Posteriors
34+
35+
The process begins with a **prior distribution** that captures our initial beliefs about a parameter. After observing data, we apply Bayes' theorem to obtain the **posterior distribution**, reflecting how our beliefs should change.
36+
37+
## Why Use Bayesian Methods?
38+
39+
Bayesian techniques are particularly useful when data is scarce or when incorporating domain knowledge is essential. They provide a coherent approach to uncertainty that can complement or outperform classical methods in many situations.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
author_profile: false
3+
categories:
4+
- Statistics
5+
classes: wide
6+
date: '2020-11-25'
7+
excerpt: See how hypothesis testing helps draw meaningful conclusions from data in practical scenarios.
8+
header:
9+
image: /assets/images/data_science_13.jpg
10+
og_image: /assets/images/data_science_13.jpg
11+
overlay_image: /assets/images/data_science_13.jpg
12+
show_overlay_excerpt: false
13+
teaser: /assets/images/data_science_13.jpg
14+
twitter_image: /assets/images/data_science_13.jpg
15+
keywords:
16+
- Hypothesis testing
17+
- P-values
18+
- Significance
19+
- Data science
20+
seo_description: Learn how to apply hypothesis tests in real-world analyses and avoid common pitfalls when interpreting p-values and confidence levels.
21+
seo_title: 'Applying Hypothesis Testing in the Real World'
22+
seo_type: article
23+
summary: This post walks through frequentist hypothesis testing, showing how to formulate null and alternative hypotheses and interpret the results in practical data science tasks.
24+
tags:
25+
- Hypothesis testing
26+
- Statistics
27+
- Experiments
28+
title: 'Applying Hypothesis Testing in the Real World'
29+
---
30+
31+
Hypothesis testing allows data scientists to objectively assess whether an observed pattern is likely due to chance or reflects a genuine effect.
32+
33+
## Null vs. Alternative Hypotheses
34+
35+
Every test starts with a **null hypothesis**, representing the status quo, and an **alternative hypothesis**, representing a potential effect. By choosing a significance level and calculating a p-value, we can decide whether to reject the null hypothesis.
36+
37+
## Common Pitfalls
38+
39+
Misinterpreting p-values or failing to consider effect sizes can lead to misguided conclusions. Always pair statistical significance with domain context to ensure results are meaningful.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
author_profile: false
3+
categories:
4+
- Data Science
5+
classes: wide
6+
date: '2020-11-30'
7+
excerpt: Discover best practices for creating clear and compelling data visualizations that communicate insights effectively.
8+
header:
9+
image: /assets/images/data_science_14.jpg
10+
og_image: /assets/images/data_science_14.jpg
11+
overlay_image: /assets/images/data_science_14.jpg
12+
show_overlay_excerpt: false
13+
teaser: /assets/images/data_science_14.jpg
14+
twitter_image: /assets/images/data_science_14.jpg
15+
keywords:
16+
- Data visualization
17+
- Charts
18+
- Communication
19+
- Best practices
20+
seo_description: Guidelines for selecting chart types, choosing colors, and avoiding clutter when visualizing data for stakeholders.
21+
seo_title: 'Data Visualization Best Practices'
22+
seo_type: article
23+
summary: Learn how to design effective visualizations by focusing on clarity, appropriate chart selection, and thoughtful use of color and labels.
24+
tags:
25+
- Visualization
26+
- Data science
27+
- Communication
28+
title: 'Data Visualization Best Practices'
29+
---
30+
31+
Effective data visualization bridges the gap between complex datasets and human understanding. Following proven design principles ensures that your charts highlight the important messages without distractions.
32+
33+
## Choosing the Right Chart
34+
35+
Different data types call for different chart styles. Use bar charts for comparisons, line charts for trends, and scatter plots for relationships. Avoid pie charts when precise comparisons are needed.
36+
37+
## Keep It Simple
38+
39+
Cluttered visuals can obscure the message. Limit the number of colors and remove unnecessary grid lines or 3D effects. Focus the audience's attention on the key insights.
40+
41+
Clear and concise visualizations help stakeholders grasp findings quickly, making your analyses more persuasive and actionable.

0 commit comments

Comments
 (0)