DiogoRibeiro7
diff --git a/‎.github/workflows/jekyll.yml‎
Lines changed: 0 additions & 8 deletions b/‎.github/workflows/jekyll.yml‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎README.md‎
Lines changed: 5 additions & 8 deletions b/‎README.md‎
Lines changed: 5 additions & 8 deletions
diff --git a/‎_config.yml‎
Lines changed: 1 addition & 1 deletion b/‎_config.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_includes/head.html‎
Lines changed: 1 addition & 1 deletion b/‎_includes/head.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_posts/2020-11-05-probability_theory_basics.md‎
Lines changed: 41 additions & 0 deletions b/‎_posts/2020-11-05-probability_theory_basics.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎_posts/2020-11-10-simple_linear_regression_intro.md‎
Lines changed: 36 additions & 0 deletions b/‎_posts/2020-11-10-simple_linear_regression_intro.md‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎_posts/2020-11-20-bayesian_inference_basics.md‎
Lines changed: 39 additions & 0 deletions b/‎_posts/2020-11-20-bayesian_inference_basics.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎_posts/2020-11-25-hypothesis_testing_real_world_applications.md‎
Lines changed: 39 additions & 0 deletions b/‎_posts/2020-11-25-hypothesis_testing_real_world_applications.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎_posts/2020-11-30-data_visualization_best_practices.md‎
Lines changed: 41 additions & 0 deletions b/‎_posts/2020-11-30-data_visualization_best_practices.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎_posts/2021-10-05-data_preprocessing_pipelines.md‎
Lines changed: 47 additions & 0 deletions b/‎_posts/2021-10-05-data_preprocessing_pipelines.md‎
Lines changed: 47 additions & 0 deletions
@@ -39,14 +39,6 @@ jobs:
           ruby-version: '3.3' # Not needed with a .ruby-version file
           bundler-cache: true # runs 'bundle install' and caches installed gems automatically
           cache-version: 0 # Increment this number if you need to re-download cached gems
-      - name: Setup Node
-        uses: actions/setup-node@v4
-        with:
-          node-version: '20'
-      - name: Install Node dependencies
-        run: npm ci
-      - name: Lint CSS
-        run: npm run lint:css
       - name: Setup Pages
         id: pages
         uses: actions/configure-pages@v5
 
@@ -73,22 +73,19 @@ GitHub Actions already runs these commands automatically during deployments.
 
 # ToDo
 
-Have a consistency in the font and font sizes (ideally you want to use 2 fonts. One for the header/subtitle and one for the text. You can use this kind of website https://fontjoy.com/ which allow you to pair fonts).
+~~Have a consistency in the font and font sizes (ideally you want to use 2 fonts. One for the header/subtitle and one for the text. You can use this kind of website https://fontjoy.com/ which allow you to pair fonts).~~
 
 Choose a few main colours for your site (I would suggest black/white/grey but not in solid. You can also use this kind of site: https://coolors.co/palettes/popular/2a4849).
 
-Reduce then size of the homepage top image (ideally you want your first articles to be visible on load and not hidden below the fold).
+~~Reduce then size of the homepage top image (ideally you want your first articles to be visible on load and not hidden below the fold).~~
 
-Restyle your links (ideally the link should be back with no underline and you add a css style on hover)
+~~Restyle your links (ideally the link should be back with no underline and you add a css style on hover)~~
 
-Center pagination
+~~Center pagination~~
 
-Restyle your article detail page breadcrumbs. You want them to be less visible (I would suggest a light grey colour here)
+~~Restyle your article detail page breadcrumbs. You want them to be less visible (I would suggest a light grey colour here)~~
 
 Right now at the top of the detail page, you have your site breadcrumbs, a title then another title and the font sizes are a bit off and it is hard to understand the role of the second title. I would reorganise this to provide a better understanding to the reader
-
 On the detail page, I would suggest you put the `You may also enjoy` straight at the end of the article. Right now it is after comments and you can lose engagement on your site.
-
 I would suggest you remove your description from the detail page. I think having it on the home page is enough. You can have a smaller introduction if needed with a read more button or link that will take the reader to a full page description of yourself and your skillset. That will allow you to tell more about yourself and why you do what you do
-
 I will create card article with a hover animation (add some shape and background colour and ideally a header image for the card. The graphs you show me last week for example.)
@@ -12,7 +12,7 @@
 
 # theme                  : "minimal-mistakes-jekyll"
 remote_theme             : "mmistakes/[email protected]"
-minimal_mistakes_skin    : "dark" #"dark"  "default" # "air", "aqua", "contrast", "dark", "dirt", "neon", "mint", "plum", "sunrise"
+minimal_mistakes_skin    : "air" #"dark"  "default" # "air", "aqua", "contrast", "dark", "dirt", "neon", "mint", "plum", "sunrise"
 
 # Site Settings
 locale                   : "en"
 
@@ -20,7 +20,7 @@
 <!-- Google Fonts -->
 <link rel="preconnect" href="https://fonts.googleapis.com">
 <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
-<link href="https://fonts.googleapis.com/css2?family=Cardo&display=swap" rel="stylesheet">
+<link href="https://fonts.googleapis.com/css2?family=Lora:wght@400;700&family=Roboto:wght@400;700&display=swap" rel="stylesheet">
 
 {% if site.head_scripts %}
   {% for script in site.head_scripts %}
 
@@ -0,0 +1,41 @@
+---
+author_profile: false
+categories:
+- Statistics
+classes: wide
+date: '2020-11-05'
+excerpt: An introduction to probability theory concepts every data scientist should know.
+header:
+  image: /assets/images/data_science_10.jpg
+  og_image: /assets/images/data_science_10.jpg
+  overlay_image: /assets/images/data_science_10.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_10.jpg
+  twitter_image: /assets/images/data_science_10.jpg
+keywords:
+- Probability theory
+- Random variables
+- Distributions
+- Data science
+seo_description: Learn the core principles of probability theory, from random variables to common distributions, with practical examples for data science.
+seo_title: 'Probability Theory Basics for Data Science'
+seo_type: article
+summary: This post reviews essential probability concepts like random variables, expectation, and common distributions, illustrating how they underpin data science workflows.
+tags:
+- Probability
+- Statistics
+- Data science
+title: 'Probability Theory Basics for Data Science'
+---
+
+Probability theory provides the mathematical foundation for modeling uncertainty. By understanding random variables and probability distributions, data scientists can quantify risks and make informed decisions.
+
+## Random Variables and Distributions
+
+A random variable assigns numerical values to outcomes in a sample space. Key distributions such as the binomial, normal, and Poisson describe how probabilities are spread across possible outcomes. Knowing these distributions helps in selecting appropriate models and estimating parameters.
+
+## Expectation and Variance
+
+Two fundamental measures of a random variable are its **expected value** and **variance**. The expected value represents the long-run average, while the variance measures how spread out the outcomes are. These metrics are critical for evaluating models and comparing predictions.
+
+Mastering probability theory enables data scientists to better interpret model outputs and reason about uncertainty in real-world applications.
@@ -0,0 +1,36 @@
+---
+author_profile: false
+categories:
+- Data Science
+classes: wide
+date: '2020-11-10'
+excerpt: Understand how simple linear regression models the relationship between two variables using a single predictor.
+header:
+  image: /assets/images/data_science_11.jpg
+  og_image: /assets/images/data_science_11.jpg
+  overlay_image: /assets/images/data_science_11.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_11.jpg
+  twitter_image: /assets/images/data_science_11.jpg
+keywords:
+- Linear regression
+- Least squares
+- Data analysis
+seo_description: Discover the mechanics of simple linear regression and how to interpret slope and intercept when fitting a straight line to data.
+seo_title: 'A Primer on Simple Linear Regression'
+seo_type: article
+summary: This article introduces simple linear regression and the least squares method, showing how a single predictor explains variation in a response variable.
+tags:
+- Regression
+- Statistics
+- Data science
+title: 'A Primer on Simple Linear Regression'
+---
+
+Simple linear regression is a foundational technique for modeling the relationship between a predictor variable and a response variable. By fitting a straight line, we can quantify how changes in one variable are associated with changes in another.
+
+## The Least Squares Method
+
+The most common approach to estimating the regression line is **ordinary least squares (OLS)**. OLS finds the line that minimizes the sum of squared residuals between the observed data points and the line's predictions. The slope indicates the strength and direction of the relationship, while the intercept shows the expected value when the predictor is zero.
+
+Understanding simple linear regression is a stepping stone toward more complex modeling techniques, providing crucial intuition about correlation and causation.
@@ -0,0 +1,39 @@
+---
+author_profile: false
+categories:
+- Statistics
+classes: wide
+date: '2020-11-20'
+excerpt: Explore the fundamentals of Bayesian inference and how prior beliefs combine with data to form posterior conclusions.
+header:
+  image: /assets/images/data_science_12.jpg
+  og_image: /assets/images/data_science_12.jpg
+  overlay_image: /assets/images/data_science_12.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_12.jpg
+  twitter_image: /assets/images/data_science_12.jpg
+keywords:
+- Bayesian statistics
+- Priors
+- Posterior distributions
+- Data science
+seo_description: An overview of Bayesian inference, demonstrating how to update prior beliefs with new evidence to make data-driven decisions.
+seo_title: 'Bayesian Inference Explained'
+seo_type: article
+summary: Learn how Bayesian inference updates prior beliefs into posterior distributions, providing a flexible framework for reasoning under uncertainty.
+tags:
+- Bayesian
+- Inference
+- Statistics
+title: 'Bayesian Inference Explained'
+---
+
+Bayesian inference offers a powerful perspective on probability, treating unknown quantities as distributions that update when new evidence appears.
+
+## Priors and Posteriors
+
+The process begins with a **prior distribution** that captures our initial beliefs about a parameter. After observing data, we apply Bayes' theorem to obtain the **posterior distribution**, reflecting how our beliefs should change.
+
+## Why Use Bayesian Methods?
+
+Bayesian techniques are particularly useful when data is scarce or when incorporating domain knowledge is essential. They provide a coherent approach to uncertainty that can complement or outperform classical methods in many situations.
@@ -0,0 +1,39 @@
+---
+author_profile: false
+categories:
+- Statistics
+classes: wide
+date: '2020-11-25'
+excerpt: See how hypothesis testing helps draw meaningful conclusions from data in practical scenarios.
+header:
+  image: /assets/images/data_science_13.jpg
+  og_image: /assets/images/data_science_13.jpg
+  overlay_image: /assets/images/data_science_13.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_13.jpg
+  twitter_image: /assets/images/data_science_13.jpg
+keywords:
+- Hypothesis testing
+- P-values
+- Significance
+- Data science
+seo_description: Learn how to apply hypothesis tests in real-world analyses and avoid common pitfalls when interpreting p-values and confidence levels.
+seo_title: 'Applying Hypothesis Testing in the Real World'
+seo_type: article
+summary: This post walks through frequentist hypothesis testing, showing how to formulate null and alternative hypotheses and interpret the results in practical data science tasks.
+tags:
+- Hypothesis testing
+- Statistics
+- Experiments
+title: 'Applying Hypothesis Testing in the Real World'
+---
+
+Hypothesis testing allows data scientists to objectively assess whether an observed pattern is likely due to chance or reflects a genuine effect.
+
+## Null vs. Alternative Hypotheses
+
+Every test starts with a **null hypothesis**, representing the status quo, and an **alternative hypothesis**, representing a potential effect. By choosing a significance level and calculating a p-value, we can decide whether to reject the null hypothesis.
+
+## Common Pitfalls
+
+Misinterpreting p-values or failing to consider effect sizes can lead to misguided conclusions. Always pair statistical significance with domain context to ensure results are meaningful.
@@ -0,0 +1,41 @@
+---
+author_profile: false
+categories:
+- Data Science
+classes: wide
+date: '2020-11-30'
+excerpt: Discover best practices for creating clear and compelling data visualizations that communicate insights effectively.
+header:
+  image: /assets/images/data_science_14.jpg
+  og_image: /assets/images/data_science_14.jpg
+  overlay_image: /assets/images/data_science_14.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_14.jpg
+  twitter_image: /assets/images/data_science_14.jpg
+keywords:
+- Data visualization
+- Charts
+- Communication
+- Best practices
+seo_description: Guidelines for selecting chart types, choosing colors, and avoiding clutter when visualizing data for stakeholders.
+seo_title: 'Data Visualization Best Practices'
+seo_type: article
+summary: Learn how to design effective visualizations by focusing on clarity, appropriate chart selection, and thoughtful use of color and labels.
+tags:
+- Visualization
+- Data science
+- Communication
+title: 'Data Visualization Best Practices'
+---
+
+Effective data visualization bridges the gap between complex datasets and human understanding. Following proven design principles ensures that your charts highlight the important messages without distractions.
+
+## Choosing the Right Chart
+
+Different data types call for different chart styles. Use bar charts for comparisons, line charts for trends, and scatter plots for relationships. Avoid pie charts when precise comparisons are needed.
+
+## Keep It Simple
+
+Cluttered visuals can obscure the message. Limit the number of colors and remove unnecessary grid lines or 3D effects. Focus the audience's attention on the key insights.
+
+Clear and concise visualizations help stakeholders grasp findings quickly, making your analyses more persuasive and actionable.
@@ -0,0 +1,47 @@
+---
+author_profile: false
+categories:
+- Data Science
+classes: wide
+date: '2021-10-05'
+excerpt: Learn how to design robust data preprocessing pipelines that prepare raw data for modeling.
+header:
+  image: /assets/images/data_science_6.jpg
+  og_image: /assets/images/data_science_6.jpg
+  overlay_image: /assets/images/data_science_6.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_6.jpg
+  twitter_image: /assets/images/data_science_6.jpg
+keywords:
+- Data preprocessing
+- Pipelines
+- Data cleaning
+- Feature engineering
+seo_description: Discover best practices for building reusable data preprocessing pipelines that handle missing values, encoding, and feature scaling.
+seo_title: Building Data Preprocessing Pipelines for Reliable Models
+seo_type: article
+summary: This post outlines the key steps in constructing data preprocessing pipelines using tools like scikit-learn to ensure consistent model inputs.
+tags:
+- Data preprocessing
+- Machine learning
+- Feature engineering
+title: Designing Effective Data Preprocessing Pipelines
+---
+
+Real-world datasets rarely come perfectly formatted for modeling. A well-designed **data preprocessing pipeline** ensures that you apply the same transformations consistently across training and production environments.
+
+## Handling Missing Values
+
+Start by assessing the extent of missing data. Common strategies include dropping incomplete rows, filling numeric columns with the mean or median, and using the most frequent category for categorical features.
+
+## Encoding Categorical Variables
+
+Many machine learning algorithms require numeric inputs. Techniques like **one-hot encoding** or **ordinal encoding** convert categories into numbers. Scikit-learn's `ColumnTransformer` allows you to apply different encoders to different columns in a single pipeline.
+
+## Scaling and Normalization
+
+Scaling features to a common range prevents variables with large magnitudes from dominating a model. Standardization (mean of zero, unit variance) is typical for linear models, while min-max scaling keeps values between 0 and 1.
+
+## Putting It All Together
+
+Use scikit-learn's `Pipeline` to chain preprocessing steps with your model. This approach guarantees that the exact same transformations are applied when predicting on new data, reducing the risk of data leakage and improving reproducibility.