DiogoRibeiro7
diff --git a/‎_posts/-_ideas/2030-01-01-climate_change.md‎
Lines changed: 2 additions & 4 deletions b/‎_posts/-_ideas/2030-01-01-climate_change.md‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎_posts/-_ideas/2030-01-01-data_model_drift.md‎
Lines changed: 0 additions & 3 deletions b/‎_posts/-_ideas/2030-01-01-data_model_drift.md‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎_posts/-_ideas/2030-01-01-ideas_statistical_tests.md‎
Lines changed: 0 additions & 18 deletions b/‎_posts/-_ideas/2030-01-01-ideas_statistical_tests.md‎
Lines changed: 0 additions & 18 deletions
diff --git a/‎_posts/2020-01-02-maximum_likelihood_estimation_statistical_modeling.md‎
Lines changed: 6 additions & 6 deletions b/‎_posts/2020-01-02-maximum_likelihood_estimation_statistical_modeling.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎_posts/2020-01-03-assessing_goodness-of-fit_non-parametric_data.md‎
Lines changed: 152 additions & 0 deletions b/‎_posts/2020-01-03-assessing_goodness-of-fit_non-parametric_data.md‎
Lines changed: 152 additions & 0 deletions
@@ -31,14 +31,12 @@ tags:
 title: Exploring Climate Change, Sustainability, and Data Science
 ---
 
-### 1. Leveraging Big Data for Climate Change Mitigation
-Explore how large-scale data collection and analysis are helping scientists better understand climate patterns and predict changes. Discuss the role of satellite data, sensors, and environmental monitoring.
+## TODO:
 
 ### 2. The Role of Machine Learning in Predicting Climate Change Impacts
 Examine how machine learning algorithms are being used to model and predict the future impacts of climate change. Focus on predictive analytics for extreme weather, sea-level rise, and biodiversity loss.
 
-### 3. Sustainability Analytics: How Data Science Drives Green Innovation
-Look at how companies and organizations are using data science to improve sustainability practices. Cover areas like resource optimization, waste reduction, and improving the energy efficiency of supply chains.
+
 
 ### 4. AI and Machine Learning in Renewable Energy Optimization
 Discuss how AI is used to optimize renewable energy sources, such as wind and solar power, by improving energy forecasting, managing grid systems, and balancing energy storage.
 
@@ -13,9 +13,6 @@ tags: []
 
 ## Article Ideas on Data Drift and Model Drift
 
-### 1. **Understanding Data Drift: What It Is and Why It Matters in Machine Learning**
-   - **Overview**: Explain the concept of data drift and how changes in input data distribution can affect model performance over time.
-   - **Focus**: Define types of data drift (e.g., **covariate drift**, **label drift**, and **concept drift**), with practical examples from industries like finance and healthcare.
 
 ### 2. **Model Drift: Why Even the Best Machine Learning Models Fail Over Time**
    - **Overview**: Explore the concept of model drift and how changes in the environment or target variable can degrade model accuracy.
 
@@ -73,29 +73,11 @@ Here are some interesting article ideas centered around statistical tests, desig
    - Explain how the sign test works as a non-parametric test.
    - Compare the sign test with the Wilcoxon signed-rank test and other small-sample tests.
 
-### 8. **"Levene's Test vs. Bartlett’s Test: Checking for Homogeneity of Variances"**
-   - Comparison of Levene's and Bartlett’s tests for checking homogeneity of variances in data.
-   - Discuss when to use each test (parametric vs. non-parametric, normal vs. non-normal data).
-   - Application in conjunction with ANOVA and other tests that assume equal variances.
 
-### 9. **"Multiple Comparisons Problem: Bonferroni Correction and Other Solutions"**
-   - Explain the multiple comparisons problem that arises in hypothesis testing.
-   - Discuss the Bonferroni correction and other methods (e.g., Holm-Bonferroni, FDR).
-   - Show practical applications in experiments with multiple testing (e.g., medical studies, genetics).
 
-### 10. **"Kolmogorov-Smirnov Test: Assessing Goodness-of-Fit in Non-Parametric Data"**
-   - Introduction to the Kolmogorov-Smirnov (K-S) test for checking distribution fit.
-   - Compare K-S test with other goodness-of-fit tests like Shapiro-Wilk.
-   - Explore real-world use cases, such as testing if a dataset follows a specific distribution.
 
 
 
-### 12. **"Log-Rank Test in Survival Analysis: Comparing Survival Curves"**
-   - Introduction to the log-rank test used in survival analysis.
-   - Discuss its applications in medical studies to compare survival times between two groups.
-   - Explain how to interpret survival curves and the p-values from a log-rank test.
-
-
 
 
 
 
@@ -59,7 +59,7 @@ The likelihood function is at the heart of MLE. It measures how likely the obser
 
 $$ x_1, x_2, \dots, x_n $$
 
-These observations are assumed to be drawn from some probability distribution, say $p(x | \theta)$, where $\theta$ represents the unknown parameters of the model. The likelihood function is the product of the probability density (or mass) functions for all observations:
+These observations are assumed to be drawn from some probability distribution, say $$p(x | \theta)$$, where $$\theta$$ represents the unknown parameters of the model. The likelihood function is the product of the probability density (or mass) functions for all observations:
 
 $$ L(\theta) = p(x_1 \mid \theta) \times p(x_2 \mid \theta) \times \dots \times p(x_n \mid \theta) $$
 
@@ -75,11 +75,11 @@ $$ \log L(\theta) = \sum_{i=1}^{n} \log p(x_i \mid \theta) $$
 
 ### 2.3 Maximization
 
-The objective of MLE is to find the parameter values that maximize the log-likelihood function. This is typically done by taking the derivative of the log-likelihood with respect to the parameter $\theta$, setting it equal to zero, and solving for $\theta$:
+The objective of MLE is to find the parameter values that maximize the log-likelihood function. This is typically done by taking the derivative of the log-likelihood with respect to the parameter $$\theta$$, setting it equal to zero, and solving for $$\theta$$:
 
 $$ \frac{\partial}{\partial \theta} \log L(\theta) = 0 $$
 
-This solution gives the maximum likelihood estimate of $\theta$, which is denoted as $\hat{\theta}$.
+This solution gives the maximum likelihood estimate of $$\theta$$, which is denoted as $$\hat{\theta}$$.
 
 ## 3. Why MLE is Essential in Data Science
 
@@ -303,16 +303,16 @@ Subclasses are expected to implement these methods.
 
 #### Normal Distribution MLE (`MLENormal`):
 
-- The `log_likelihood()` method computes the log-likelihood for the normal distribution given mean ($\mu$) and variance ($\sigma^2$).
+- The `log_likelihood()` method computes the log-likelihood for the normal distribution given mean ($$\mu$$) and variance ($$\sigma^2$$).
 - The `fit()` method estimates the parameters (mean and variance) using the following formulas:
 
 $$ \hat{\mu} = \text{mean}(data) $$  
 $$ \hat{\sigma^2} = \text{variance}(data) $$
 
 #### Bernoulli Distribution MLE (`MLEBernoulli`):
 
-- The `log_likelihood()` method computes the log-likelihood for the Bernoulli distribution based on the probability $p$ of success.
-- The `fit()` method estimates the probability $p$ using the formula:
+- The `log_likelihood()` method computes the log-likelihood for the Bernoulli distribution based on the probability $$p$$ of success.
+- The `fit()` method estimates the probability $$p$$ using the formula:
 
 $$ \hat{p} = \text{mean}(data) $$
 
 
@@ -0,0 +1,152 @@
+---
+author_profile: false
+categories:
+- Statistics
+- Data Science
+- Machine Learning
+classes: wide
+date: '2020-01-03'
+excerpt: The Kolmogorov-Smirnov test is a powerful tool for assessing goodness-of-fit in non-parametric data. Learn how it works, how it compares to the Shapiro-Wilk test, and explore real-world applications.
+header:
+  image: /assets/images/data_science_3.jpg
+  og_image: /assets/images/data_science_3.jpg
+  overlay_image: /assets/images/data_science_3.jpg
+  show_overlay_excerpt: false
+  teaser: /assets/images/data_science_3.jpg
+  twitter_image: /assets/images/data_science_3.jpg
+keywords:
+- Kolmogorov-Smirnov test
+- goodness-of-fit tests
+- non-parametric statistics
+- distribution fitting
+- Shapiro-Wilk test
+seo_description: This article introduces the Kolmogorov-Smirnov test for assessing goodness-of-fit in non-parametric data, comparing it with other tests like Shapiro-Wilk, and exploring real-world use cases.
+seo_title: 'Kolmogorov-Smirnov Test: A Guide to Non-Parametric Goodness-of-Fit Testing'
+seo_type: article
+summary: This article explains the Kolmogorov-Smirnov (K-S) test for assessing the goodness-of-fit of non-parametric data. We compare the K-S test to other goodness-of-fit tests, such as Shapiro-Wilk, and provide real-world use cases, including testing whether a dataset follows a specific distribution.
+tags:
+- Kolmogorov-Smirnov Test
+- Goodness-of-Fit Tests
+- Non-Parametric Data
+- Shapiro-Wilk Test
+- Distribution Fitting
+title: 'Kolmogorov-Smirnov Test: Assessing Goodness-of-Fit in Non-Parametric Data'
+---
+
+## Introduction to the Kolmogorov-Smirnov Test
+
+The **Kolmogorov-Smirnov (K-S) test** is a widely used statistical method for assessing whether a sample of data follows a specific distribution. As a **non-parametric** test, the K-S test does not assume any specific underlying data distribution, making it particularly valuable for situations where we cannot confidently assume a parametric form like the normal distribution. Instead, the K-S test compares the cumulative distribution function (CDF) of the observed data with the CDF of a reference distribution, assessing how well they align.
+
+The K-S test is especially useful for:
+
+- **Goodness-of-fit testing**: Evaluating whether an observed dataset conforms to a known distribution, such as normal, uniform, or exponential.
+- **Comparing two distributions**: Testing whether two independent samples come from the same distribution.
+
+In this article, we will explore how the Kolmogorov-Smirnov test works, how it compares to other goodness-of-fit tests like **Shapiro-Wilk**, and some real-world applications where the K-S test is particularly useful.
+
+## How the Kolmogorov-Smirnov Test Works
+
+The K-S test compares the **empirical cumulative distribution function (ECDF)** of a dataset to the CDF of a specified reference distribution. The goal is to determine if the two distributions differ significantly, which would indicate that the observed data does not follow the hypothesized distribution.
+
+### 1.1 Steps of the K-S Test
+
+1. **Define the hypothesis**: 
+   - **Null hypothesis ($H_0$):** The sample data follows the specified distribution (e.g., normal, uniform, etc.).
+   - **Alternative hypothesis ($H_A$):** The sample data does not follow the specified distribution.
+
+2. **Compute the empirical cumulative distribution function (ECDF)**: The ECDF represents the proportion of observed data points less than or equal to each value in the dataset. This is the observed distribution.
+
+3. **Compare the ECDF to the reference distribution's CDF**: The test calculates the **maximum difference** between the ECDF and the CDF of the specified reference distribution. This difference is denoted by **D**.
+
+4. **Calculate the test statistic (D)**: The test statistic for the K-S test is the maximum absolute difference between the ECDF and the reference CDF:
+
+   $$
+   D = \max |F_n(x) - F(x)|
+   $$
+
+   Where:
+   - $F_n(x)$ is the ECDF of the sample data.
+   - $F(x)$ is the CDF of the reference distribution.
+
+5. **Interpret the results**: The calculated **D-statistic** is compared to a critical value from the **Kolmogorov distribution**. A p-value is derived from this comparison. If the p-value is lower than a chosen significance level (usually 0.05), the null hypothesis is rejected, meaning that the data does not follow the specified distribution.
+
+### 1.2 One-Sample and Two-Sample K-S Tests
+
+There are two main variations of the K-S test:
+
+- **One-sample K-S test**: Used to compare an observed sample to a specific theoretical distribution (e.g., testing if data is normally distributed).
+  
+- **Two-sample K-S test**: Used to compare two independent samples to see if they are drawn from the same distribution.
+
+### 1.3 P-Value Interpretation
+
+The **p-value** from the K-S test tells us the probability of observing a test statistic as extreme as the one computed, assuming the null hypothesis is true. A small p-value (typically less than 0.05) suggests that the observed data does not come from the specified distribution.
+
+## K-S Test vs. Other Goodness-of-Fit Tests
+
+There are several other goodness-of-fit tests that serve similar purposes to the Kolmogorov-Smirnov test. The key difference between these tests often lies in their assumptions, sensitivity to different types of deviations, and specific use cases.
+
+### 2.1 Shapiro-Wilk Test
+
+The **Shapiro-Wilk test** is one of the most commonly used goodness-of-fit tests for assessing **normality**. Unlike the K-S test, the Shapiro-Wilk test is **parametric**, meaning it specifically tests whether a sample comes from a normal distribution. 
+
+#### Comparison to K-S Test:
+
+- **Assumptions**: The Shapiro-Wilk test is strictly used for testing normality, while the K-S test can be applied to any reference distribution, making it more versatile.
+- **Sensitivity**: The Shapiro-Wilk test is more powerful (i.e., it has higher sensitivity) when it comes to detecting deviations from normality, especially in small samples. The K-S test may be less sensitive in detecting small differences between the empirical and reference distributions.
+- **Use cases**: Shapiro-Wilk is preferred for small datasets and when specifically testing for normality. The K-S test is ideal for larger datasets or when testing goodness-of-fit to any distribution (normal, uniform, exponential, etc.).
+
+### 2.2 Anderson-Darling Test
+
+The **Anderson-Darling test** is another goodness-of-fit test that is an extension of the K-S test but gives more weight to the tails of the distribution. It is particularly useful when the focus is on the fit in the tails of the distribution, as might be the case in risk modeling or financial applications.
+
+#### Comparison to K-S Test:
+
+- **Tail sensitivity**: The Anderson-Darling test is more sensitive to differences in the tails of the distribution compared to the K-S test, which treats all parts of the distribution equally.
+- **Use cases**: Anderson-Darling is favored when deviations in the tails of the distribution are critical, such as in stress testing for financial risk or extreme event modeling.
+
+### 2.3 Chi-Squared Test
+
+The **Chi-squared goodness-of-fit test** compares the frequency distribution of observed data to a theoretically expected frequency distribution. It is widely used in categorical data but can also be applied to continuous data if binned into categories.
+
+#### Comparison to K-S Test:
+
+- **Assumptions**: The chi-squared test requires the data to be grouped into categories, which may involve loss of information when applied to continuous data. The K-S test, by contrast, works directly with continuous data without binning.
+- **Data Type**: Chi-squared is often used for categorical data, while the K-S test is better suited for continuous data.
+- **Sensitivity**: The chi-squared test can be less sensitive to small sample sizes or when the expected frequencies in some categories are very low.
+
+## Real-World Use Cases of the Kolmogorov-Smirnov Test
+
+The Kolmogorov-Smirnov test is applicable across various fields, including finance, biology, engineering, and data science. Below, we explore some real-world use cases where the K-S test is particularly valuable.
+
+### 3.1 Testing for Normality in Finance
+
+In financial markets, it is often necessary to test whether the returns on stocks, bonds, or other financial instruments follow a **normal distribution**. Many financial models, such as those used for portfolio optimization or risk management, assume that returns are normally distributed. 
+
+The one-sample K-S test can be used to assess whether historical returns data conform to a normal distribution. If the p-value is low, analysts might conclude that the returns deviate significantly from normality, which would affect the assumptions of the models they are using.
+
+#### Example:
+
+An investment firm wants to test whether the daily returns of a stock over the past year follow a normal distribution. By applying the K-S test, they compare the ECDF of the daily returns to the CDF of a normal distribution with the same mean and standard deviation. A significant p-value would suggest that the returns are not normally distributed, and the firm may need to revise its risk models.
+
+### 3.2 Quality Control in Manufacturing
+
+In manufacturing, the K-S test can be used to determine whether a batch of products conforms to a specific tolerance level for a continuous variable, such as weight or size. Ensuring that products follow the expected distribution can be critical for maintaining quality and consistency in production.
+
+#### Example:
+
+A company that manufactures precision-engineered components wants to ensure that the diameter of their parts follows a uniform distribution within a specific tolerance range. By applying the K-S test, they compare the observed distribution of part diameters to a uniform distribution. If the p-value from the K-S test is below the threshold, the company may need to investigate potential issues in the manufacturing process.
+
+### 3.3 Ecological Studies: Comparing Species Distributions
+
+In ecology, researchers often compare the distribution of species in different habitats or regions to understand environmental influences on biodiversity. The two-sample K-S test is useful for comparing the distribution of species abundances between different ecosystems or time periods.
+
+#### Example:
+
+An ecologist is studying the distribution of bird species in two different regions to determine if the environmental conditions result in significant differences in species diversity. The two-sample K-S test can be used to compare the distributions of bird counts in the two regions. A significant result would indicate that the distributions differ, suggesting that the regions have different environmental characteristics affecting biodiversity.
+
+## Conclusion
+
+The **Kolmogorov-Smirnov test** is a powerful, versatile tool for assessing the goodness-of-fit in non-parametric data and comparing two distributions. Its ability to test data against any theoretical distribution—without the need for strong parametric assumptions—makes it particularly useful in many fields, from finance to ecology.
+
+While the K-S test has some limitations, such as lower sensitivity compared to tests like Shapiro-Wilk for detecting deviations from normality, its flexibility and simplicity make it a popular choice for distributional comparisons. By understanding how to use the K-S test and interpreting its results, data scientists and researchers can draw meaningful conclusions about the underlying patterns in their data.