diff --git a/_posts/-_ideas/2030-01-01-Article Title Ideas for Statistical Tests.md b/_posts/-_ideas/2030-01-01-Article Title Ideas for Statistical Tests.md index ef2d4ef3..6a180013 100644 --- a/_posts/-_ideas/2030-01-01-Article Title Ideas for Statistical Tests.md +++ b/_posts/-_ideas/2030-01-01-Article Title Ideas for Statistical Tests.md @@ -67,11 +67,3 @@ TODO: ### 13. **"Granger Causality Test: Assessing Temporal Causal Relationships in Time-Series Data"** - Introduction to the Granger causality test for time-series data. - Applications in economics, climate science, and finance. - -### 14. **"Shapiro-Wilk Test vs. Anderson-Darling: Checking for Normality in Small vs. Large Samples"** - - Comparing two common tests for normality: Shapiro-Wilk and Anderson-Darling. - - How sample size and distribution affect the choice of normality test. - -### 15. **"Log-Rank Test: Comparing Survival Curves in Clinical Studies"** - - Overview of the Log-Rank test for comparing survival distributions. - - Applications in clinical trials, epidemiology, and medical research. diff --git a/_posts/2019-12-28-shapirowilk_test_vs_andersondarling_checking_normality_small_large_samples.md b/_posts/2019-12-28-shapirowilk_test_vs_andersondarling_checking_normality_small_large_samples.md new file mode 100644 index 00000000..597bfbcd --- /dev/null +++ b/_posts/2019-12-28-shapirowilk_test_vs_andersondarling_checking_normality_small_large_samples.md @@ -0,0 +1,165 @@ +--- +author_profile: false +categories: +- Statistics +classes: wide +date: '2019-12-28' +excerpt: Explore the differences between the Shapiro-Wilk and Anderson-Darling tests, + two common methods for testing normality, and how sample size and distribution affect + their performance. +header: + image: /assets/images/data_science_20.jpg + og_image: /assets/images/data_science_20.jpg + overlay_image: /assets/images/data_science_20.jpg + show_overlay_excerpt: false + teaser: /assets/images/data_science_20.jpg + twitter_image: /assets/images/data_science_20.jpg +keywords: +- Shapiro-wilk test +- Anderson-darling test +- Normality test +- Small sample size +- Large sample size +- Statistical distribution +- Python +seo_description: A comparison of the Shapiro-Wilk and Anderson-Darling tests for normality, + analyzing their strengths and weaknesses based on sample size and distribution. +seo_title: 'Shapiro-Wilk vs Anderson-Darling: Normality Tests for Small and Large + Samples' +seo_type: article +summary: This article compares the Shapiro-Wilk and Anderson-Darling tests, emphasizing + how sample size and distribution characteristics influence the choice of method + when assessing normality. +tags: +- Normality testing +- Shapiro-wilk test +- Anderson-darling test +- Sample size +- Python +title: 'Shapiro-Wilk Test vs. Anderson-Darling: Checking for Normality in Small vs. + Large Samples' +--- + +## Shapiro-Wilk Test vs. Anderson-Darling: Checking for Normality in Small vs. Large Samples + +Testing for normality is a crucial step in many statistical analyses, particularly when using parametric tests that assume data is normally distributed. Two of the most widely used normality tests are the **Shapiro-Wilk test** and the **Anderson-Darling test**. Although both are used to assess whether a dataset follows a normal distribution, they perform differently depending on sample size and the underlying distribution characteristics. This article explores these differences and guides how to choose the appropriate test based on your data. + +### 1. Understanding the Basics of Normality Testing + +In statistics, many parametric tests (such as t-tests or ANOVAs) require the assumption that the data follows a normal distribution. While visual methods like histograms or Q-Q plots are useful for assessing normality, formal statistical tests like Shapiro-Wilk and Anderson-Darling provide quantitative measures. + +#### Why Is Normality Important? + +- **Parametric tests** (like the t-test, ANOVA) are based on the assumption that the underlying data follows a normal distribution. +- **Non-normal data** can lead to inaccurate results in hypothesis testing, confidence intervals, and other statistical inferences. + +The objective of normality tests is to determine whether to reject the hypothesis that a dataset is drawn from a normally distributed population. + +### 2. Shapiro-Wilk Test: Best for Small Samples + +The **Shapiro-Wilk test** is commonly regarded as the most powerful test for detecting deviations from normality, especially for **small sample sizes** (usually \( n < 50 \)). It was introduced in 1965 by Shapiro and Wilk and is based on the correlation between the data and the corresponding normal scores. + +#### How Does It Work? + +The Shapiro-Wilk test compares the ordered data points with the expected values of a normal distribution. The null hypothesis (\( H_0 \)) for the Shapiro-Wilk test states that the data is normally distributed. If the test produces a **p-value** below a predefined significance level (commonly 0.05), the null hypothesis is rejected, suggesting that the data is not normally distributed. + +- **Test statistic**: The test statistic \( W \) is calculated using the equation: + + $$ + W = \frac{\left( \sum_{i=1}^{n} a_i x_{(i)} \right)^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2} + $$ + + where \( a_i \) are constants generated from a normal distribution, \( x_{(i)} \) are the ordered sample values, and \( \bar{x} \) is the sample mean. + +#### Strengths of Shapiro-Wilk + +- **High power with small samples**: The Shapiro-Wilk test is highly effective in detecting non-normality in small datasets, typically outperforming other tests when \( n \) is below 50. +- **Sensitive to skewness and kurtosis**: It can detect deviations due to both the shape of the distribution and extreme values. + +#### Limitations + +- **Less effective for large samples**: When sample sizes increase significantly (e.g., \( n > 2000 \)), the Shapiro-Wilk test becomes overly sensitive and may flag trivial deviations as significant. +- **Slower computation**: The test involves more complex calculations, making it computationally heavier for larger datasets. + +### 3. Anderson-Darling Test: Better for Large Samples + +The **Anderson-Darling test** is another widely used normality test, which is a modification of the Kolmogorov-Smirnov test. It provides a more sensitive measure of the difference between the empirical distribution of the data and the expected cumulative distribution of a normal distribution. Unlike the Shapiro-Wilk test, the Anderson-Darling test performs well with **larger sample sizes**. + +#### How Does It Work? + +The Anderson-Darling test compares the observed cumulative distribution function (CDF) of the data to the expected CDF of the normal distribution. The test statistic \( A^2 \) is calculated based on the differences between these functions, giving more weight to the tails of the distribution: + +- **Test statistic**: The Anderson-Darling statistic is computed as: + + $$ + A^2 = -n - \frac{1}{n} \sum_{i=1}^{n} \left[ (2i-1) \left( \ln F(x_{(i)}) + \ln(1 - F(x_{(n+1-i)})) \right) \right] + $$ + + where \( F(x) \) is the cumulative distribution function of the normal distribution. + +#### Strengths of Anderson-Darling + +- **More sensitive to tail behavior**: The Anderson-Darling test gives more weight to observations in the tails of the distribution, making it particularly useful for detecting deviations in the extremes. +- **Suitable for larger samples**: It performs well with larger datasets and remains powerful for both small and large samples, though it is especially reliable for larger datasets (e.g., \( n > 50 \)). + +#### Limitations + +- **Less powerful for small samples**: The Anderson-Darling test may not detect non-normality as effectively as the Shapiro-Wilk test for small datasets. +- **More prone to Type I errors**: In very large samples, it may detect statistically significant but practically negligible deviations from normality. + +### 4. Choosing Between Shapiro-Wilk and Anderson-Darling + +The choice between Shapiro-Wilk and Anderson-Darling tests depends primarily on the **sample size** and the **type of deviations** you expect from normality. + +#### Small Samples (\( n < 50 \)) + +For small sample sizes, the Shapiro-Wilk test is generally preferred due to its higher power and reliability. It is more sensitive to deviations in both the center and tails of the distribution in smaller datasets. + +- **Recommendation**: Use Shapiro-Wilk for \( n < 50 \). + +#### Large Samples (\( n > 200 \)) + +As sample size increases, the Shapiro-Wilk test can become too sensitive, flagging minor deviations as statistically significant. The Anderson-Darling test, with its focus on tail behavior, often provides a more balanced view of normality for larger samples. + +- **Recommendation**: Use Anderson-Darling for larger samples, especially if deviations in the tails are of particular interest. + +#### Mid-range Samples (\( 50 \leq n \leq 200 \)) + +For datasets that fall in this mid-range, both tests can be useful, depending on the nature of the data. If your analysis is concerned with tail behavior or extreme values, the Anderson-Darling test may be more informative. However, the Shapiro-Wilk test remains a reliable choice if computational efficiency is not a concern. + +### 5. Impact of Distribution Characteristics on Test Choice + +Different distributions, especially those with heavy tails, skewness, or kurtosis, can influence the performance of normality tests. Both the Shapiro-Wilk and Anderson-Darling tests can detect non-normality, but their focus differs slightly. + +- **Tail-heavy distributions**: The Anderson-Darling test is better suited for detecting deviations in the tails. +- **Symmetry and kurtosis**: The Shapiro-Wilk test is generally better at identifying issues related to skewness and kurtosis in smaller datasets. + +### 6. Practical Considerations and Software Implementation + +Both the Shapiro-Wilk and Anderson-Darling tests are widely implemented in statistical software such as R, Python (via SciPy), and SPSS. Here are examples of how to perform these tests in Python: + +#### Shapiro-Wilk in Python + +```python +from scipy.stats import shapiro + +data = [4.5, 5.6, 7.8, 4.3, 6.1] +stat, p = shapiro(data) +print('Statistics=%.3f, p=%.3f' % (stat, p)) +``` + +#### Anderson-Darling in Python + +```python +from scipy.stats import anderson + +data = [4.5, 5.6, 7.8, 4.3, 6.1] +result = anderson(data) +print('Statistic: %.3f' % result.statistic) +``` + +### 7. Conclusion: Which Test Should You Use? + +Ultimately, the decision between the Shapiro-Wilk and Anderson-Darling tests depends on your sample size and the nature of the deviations you want to detect. For small samples, the Shapiro-Wilk test is a powerful and reliable option, while the Anderson-Darling test offers a more flexible and tail-sensitive approach, particularly useful for larger datasets. + +Both tests provide valuable insights into the distribution of your data, ensuring you can make informed decisions in parametric testing and beyond. diff --git a/_posts/2020-01-07-how_big_data_transforming_predictive_maintenance.md b/_posts/2020-01-07-how_big_data_transforming_predictive_maintenance.md index 3ea3c5dc..9841febb 100644 --- a/_posts/2020-01-07-how_big_data_transforming_predictive_maintenance.md +++ b/_posts/2020-01-07-how_big_data_transforming_predictive_maintenance.md @@ -1,7 +1,7 @@ --- author_profile: false categories: -- Big Data +- Data Science classes: wide date: '2020-01-07' excerpt: Big Data is revolutionizing predictive maintenance by offering unprecedented @@ -56,7 +56,7 @@ title: How Big Data is Transforming Predictive Maintenance --- author_profile: false categories: -- Big Data +- Data Science classes: wide date: '2020-01-07' excerpt: Big Data is revolutionizing predictive maintenance by offering unprecedented diff --git a/_posts/2021-06-01-customer_segmentation.md b/_posts/2021-06-01-customer_segmentation.md index 6dbea409..2def825e 100644 --- a/_posts/2021-06-01-customer_segmentation.md +++ b/_posts/2021-06-01-customer_segmentation.md @@ -1,7 +1,7 @@ --- author_profile: false categories: -- Customer Analytics +- Data Science classes: wide date: '2021-06-01' excerpt: RFM Segmentation (Recency, Frequency, Monetary Value) is a widely used method @@ -15,6 +15,7 @@ header: teaser: /assets/images/data_science_9.jpg twitter_image: /assets/images/data_science_1.jpg keywords: +- Customer analytics - Customer segmentation - Unsupervised learning - Data science diff --git a/_posts/2022-07-26-geospatial_data_for_public_health_insights.md b/_posts/2022-07-26-geospatial_data_public_health_insights.md similarity index 97% rename from _posts/2022-07-26-geospatial_data_for_public_health_insights.md rename to _posts/2022-07-26-geospatial_data_public_health_insights.md index d41d7dcf..c4490dc9 100644 --- a/_posts/2022-07-26-geospatial_data_for_public_health_insights.md +++ b/_posts/2022-07-26-geospatial_data_public_health_insights.md @@ -4,9 +4,17 @@ categories: - Data Science - Public Health classes: wide +date: '2022-07-26' excerpt: Spatial epidemiology combines geospatial data with data science techniques to track and analyze disease outbreaks, offering public health agencies critical tools for intervention and planning. +header: + image: /assets/images/data_science_19.jpg + og_image: /assets/images/data_science_19.jpg + overlay_image: /assets/images/data_science_19.jpg + show_overlay_excerpt: false + teaser: /assets/images/data_science_19.jpg + twitter_image: /assets/images/data_science_19.jpg keywords: - Spatial epidemiology - Geospatial data @@ -18,6 +26,7 @@ seo_description: Explore how geospatial data is revolutionizing public health. L how spatial epidemiology and data science techniques track disease outbreaks and offer critical insights for health interventions. seo_title: 'Spatial Epidemiology: Leveraging Geospatial Data in Public Health' +seo_type: article summary: This article explores the importance of geospatial data in spatial epidemiology, focusing on how it is used to track and analyze disease outbreaks. It delves into the integration of spatial data with data science methods and how these insights diff --git a/_posts/2023-12-30-expected_shortfall.md b/_posts/2023-12-30-expected_shortfall.md index 606619fa..18b832a5 100644 --- a/_posts/2023-12-30-expected_shortfall.md +++ b/_posts/2023-12-30-expected_shortfall.md @@ -2,7 +2,6 @@ author_profile: false categories: - Data Science -- Financial Risk Management classes: wide date: '2023-12-30' excerpt: A comprehensive comparison of Value at Risk (VaR) and Expected Shortfall diff --git a/_posts/2024-02-01-customer_life_value.md b/_posts/2024-02-01-customer_life_value.md index 03290781..d6934d31 100644 --- a/_posts/2024-02-01-customer_life_value.md +++ b/_posts/2024-02-01-customer_life_value.md @@ -2,7 +2,6 @@ author_profile: false categories: - Machine Learning -- Data Science classes: wide date: '2024-02-01' excerpt: Discover the importance of Customer Lifetime Value (CLV) in shaping business diff --git a/_posts/2024-05-21-Probability_integral_transform.md b/_posts/2024-05-21-Probability_integral_transform.md index 04a6e7e7..f814fe31 100644 --- a/_posts/2024-05-21-Probability_integral_transform.md +++ b/_posts/2024-05-21-Probability_integral_transform.md @@ -7,6 +7,8 @@ categories: - Machine Learning classes: wide date: '2024-05-21' +excerpt: An in-depth guide to understanding and applying the Probability Integral + Transform in various fields, from finance to statistics. header: image: /assets/images/data_science_2.jpg og_image: /assets/images/data_science_3.jpg @@ -14,7 +16,24 @@ header: show_overlay_excerpt: false teaser: /assets/images/data_science_2.jpg twitter_image: /assets/images/data_science_3.jpg +keywords: +- Probability integral transform +- Cumulative distribution function +- Goodness of fit +- Copula construction +- Financial risk management +- Monte carlo simulations +- Hypothesis testing +- Credit risk modeling +- R +seo_description: A comprehensive exploration of the probability integral transform, + its theoretical foundations, and practical applications in fields such as risk management + and marketing mix modeling. +seo_title: 'Probability Integral Transform: Theory and Applications' seo_type: article +summary: This article explains the Probability Integral Transform, its role in statistical + modeling, and how it is applied in diverse fields like risk management, hypothesis + testing, and Monte Carlo simulations. tags: - Probability integral transform - Cumulative distribution function @@ -131,6 +150,8 @@ categories: - Machine Learning classes: wide date: '2024-05-21' +excerpt: An in-depth guide to understanding and applying the Probability Integral + Transform in various fields, from finance to statistics. header: image: /assets/images/data_science_2.jpg og_image: /assets/images/data_science_3.jpg @@ -138,7 +159,24 @@ header: show_overlay_excerpt: false teaser: /assets/images/data_science_2.jpg twitter_image: /assets/images/data_science_3.jpg +keywords: +- Probability integral transform +- Cumulative distribution function +- Goodness of fit +- Copula construction +- Financial risk management +- Monte carlo simulations +- Hypothesis testing +- Credit risk modeling +- R +seo_description: A comprehensive exploration of the probability integral transform, + its theoretical foundations, and practical applications in fields such as risk management + and marketing mix modeling. +seo_title: 'Probability Integral Transform: Theory and Applications' seo_type: article +summary: This article explains the Probability Integral Transform, its role in statistical + modeling, and how it is applied in diverse fields like risk management, hypothesis + testing, and Monte Carlo simulations. tags: - Probability integral transform - Cumulative distribution function @@ -259,6 +297,8 @@ categories: - Machine Learning classes: wide date: '2024-05-21' +excerpt: An in-depth guide to understanding and applying the Probability Integral + Transform in various fields, from finance to statistics. header: image: /assets/images/data_science_2.jpg og_image: /assets/images/data_science_3.jpg @@ -266,7 +306,24 @@ header: show_overlay_excerpt: false teaser: /assets/images/data_science_2.jpg twitter_image: /assets/images/data_science_3.jpg +keywords: +- Probability integral transform +- Cumulative distribution function +- Goodness of fit +- Copula construction +- Financial risk management +- Monte carlo simulations +- Hypothesis testing +- Credit risk modeling +- R +seo_description: A comprehensive exploration of the probability integral transform, + its theoretical foundations, and practical applications in fields such as risk management + and marketing mix modeling. +seo_title: 'Probability Integral Transform: Theory and Applications' seo_type: article +summary: This article explains the Probability Integral Transform, its role in statistical + modeling, and how it is applied in diverse fields like risk management, hypothesis + testing, and Monte Carlo simulations. tags: - Probability integral transform - Cumulative distribution function diff --git a/_posts/2024-07-11-pre_commit.md b/_posts/2024-07-11-pre_commit.md index 10aa7d4b..eb28dd46 100644 --- a/_posts/2024-07-11-pre_commit.md +++ b/_posts/2024-07-11-pre_commit.md @@ -1,7 +1,7 @@ --- author_profile: false categories: -- Software Development +- Python classes: wide date: '2024-07-11' header: diff --git a/_posts/2024-07-16-Einstein.md b/_posts/2024-07-16-Einstein.md index 33b02d7d..9ddb5412 100644 --- a/_posts/2024-07-16-Einstein.md +++ b/_posts/2024-07-16-Einstein.md @@ -1,7 +1,6 @@ --- author_profile: false categories: -- Science - Data Analysis classes: wide date: '2024-07-16' diff --git a/_posts/2024-07-31-Custom_libraries.md b/_posts/2024-07-31-Custom_libraries.md index 5962694f..96a31343 100644 --- a/_posts/2024-07-31-Custom_libraries.md +++ b/_posts/2024-07-31-Custom_libraries.md @@ -1,11 +1,10 @@ --- author_profile: false categories: -- Software Development - Python -- Industry Solutions classes: wide date: '2024-07-31' +excerpt: A guide on developing custom Python libraries to meet specific industry needs, focusing on software development and automation. header: image: /assets/images/data_science_4.jpg og_image: /assets/images/data_science_5.jpg @@ -13,7 +12,16 @@ header: show_overlay_excerpt: false teaser: /assets/images/data_science_4.jpg twitter_image: /assets/images/data_science_5.jpg +keywords: +- Python libraries +- Custom software development +- Automation +- Industry solutions +- python +seo_description: Learn how to create custom Python libraries tailored to your industry needs. This article covers strategies for software development and automation using Python. +seo_title: Building Custom Python Libraries for Industry-Specific Solutions seo_type: article +summary: This article explores the process of building custom Python libraries, offering insights into Python’s versatility for developing industry-specific software solutions and automation tools. tags: - Python libraries - Custom software @@ -21,6 +29,7 @@ tags: - Software development - Automation - Python +- python title: Building Custom Python Libraries for Your Industry Needs --- diff --git a/_posts/2024-08-31-PAPE.md b/_posts/2024-08-31-PAPE.md index ea018475..c7b8a732 100644 --- a/_posts/2024-08-31-PAPE.md +++ b/_posts/2024-08-31-PAPE.md @@ -2,8 +2,6 @@ author_profile: false categories: - Machine Learning -- Data Science -- Model Performance classes: wide date: '2024-08-31' excerpt: Explore adaptive performance estimation techniques in machine learning, including diff --git a/_posts/2024-09-12-importance_sampling.md b/_posts/2024-09-12-importance_sampling.md index 835a0157..e477c2b2 100644 --- a/_posts/2024-09-12-importance_sampling.md +++ b/_posts/2024-09-12-importance_sampling.md @@ -1,8 +1,7 @@ --- author_profile: false categories: -- Finance -- Risk Management +- Statistics classes: wide date: '2024-09-12' excerpt: Importance Sampling offers an efficient alternative to traditional Monte @@ -56,8 +55,7 @@ Estimating credit risk in portfolios containing loans or bonds is crucial for fi --- author_profile: false categories: -- Finance -- Risk Management +- Statistics classes: wide date: '2024-09-12' excerpt: Importance Sampling offers an efficient alternative to traditional Monte @@ -123,8 +121,7 @@ In this model, each obligor’s default is influenced by a set of **systematic f --- author_profile: false categories: -- Finance -- Risk Management +- Statistics classes: wide date: '2024-09-12' excerpt: Importance Sampling offers an efficient alternative to traditional Monte @@ -202,8 +199,7 @@ When obligors are dependent (i.e., influenced by common risk factors), IS become --- author_profile: false categories: -- Finance -- Risk Management +- Statistics classes: wide date: '2024-09-12' excerpt: Importance Sampling offers an efficient alternative to traditional Monte diff --git a/_posts/2024-09-17-ml_healthcare.md b/_posts/2024-09-17-ml_healthcare.md index ff3d5263..c68aa906 100644 --- a/_posts/2024-09-17-ml_healthcare.md +++ b/_posts/2024-09-17-ml_healthcare.md @@ -1,8 +1,6 @@ --- author_profile: false categories: -- Healthcare -- Machine Learning - Data Analytics classes: wide date: '2024-09-17' @@ -23,6 +21,7 @@ keywords: - Medical imaging - Personalized medicine - Predictive analytics +- Healthcare - Healthcare data privacy - Clinical implementation challenges - Predictive patient outcomes diff --git a/_posts/2024-10-01-automated_prompt_engineering.md b/_posts/2024-10-01-automated_prompt_engineering.md index bc1c1653..9a6d2730 100644 --- a/_posts/2024-10-01-automated_prompt_engineering.md +++ b/_posts/2024-10-01-automated_prompt_engineering.md @@ -1,7 +1,6 @@ --- author_profile: false categories: -- AI - Machine Learning classes: wide date: '2024-10-01' diff --git a/_posts/2024-10-12-how_data_science_reshaping_business_strategy_age_machine_learning.md b/_posts/2024-10-12-how_data_science_reshaping_business_strategy_age_machine_learning.md index a0c6eb0c..70bef2c4 100644 --- a/_posts/2024-10-12-how_data_science_reshaping_business_strategy_age_machine_learning.md +++ b/_posts/2024-10-12-how_data_science_reshaping_business_strategy_age_machine_learning.md @@ -1,9 +1,7 @@ --- author_profile: false categories: -- Data Science - Machine Learning -- Business Strategy classes: wide date: '2024-10-12' excerpt: Data-driven decision-making, powered by data science and machine learning, diff --git a/_posts/2024-10-16-predictive_analytics_healthcare_anticipating_health_issues_before_they_happen.md b/_posts/2024-10-16-predictive_analytics_healthcare_anticipating_health_issues_before_they_happen.md index a002e418..938e1213 100644 --- a/_posts/2024-10-16-predictive_analytics_healthcare_anticipating_health_issues_before_they_happen.md +++ b/_posts/2024-10-16-predictive_analytics_healthcare_anticipating_health_issues_before_they_happen.md @@ -1,7 +1,7 @@ --- author_profile: false categories: -- Predictive Analytics +- Machine Learning classes: wide date: '2024-10-16' excerpt: Predictive analytics in healthcare is transforming how providers foresee diff --git a/_sass/minimal-mistakes/_forms.scss b/_sass/minimal-mistakes/_forms.scss index 9d17d4d9..6c6fdfad 100644 --- a/_sass/minimal-mistakes/_forms.scss +++ b/_sass/minimal-mistakes/_forms.scss @@ -25,7 +25,7 @@ form { } p { - margin-bottom: (5px / 2); + margin-bottom: calc(5px / 2); } ul { diff --git a/_sass/minimal-mistakes/_mixins.scss b/_sass/minimal-mistakes/_mixins.scss index 90c34d0b..55ce8eb0 100644 --- a/_sass/minimal-mistakes/_mixins.scss +++ b/_sass/minimal-mistakes/_mixins.scss @@ -1,4 +1,5 @@ @use 'sass:math'; +@use "sass:color"; /* ========================================================================== MIXINS ========================================================================== */ @@ -62,11 +63,11 @@ $color, $threshold: $yiq-contrasted-threshold ) { - $red: red($color); - $green: green($color); - $blue: blue($color); + $red: color.channel($color, "red", $space: rgb); + $green: color.channel($color, "green", $space: rgb); + $blue: color.channel($color, "blue", $space: rgb); - $yiq: (($red*299)+($green*587)+($blue*114))/1000; + $yiq: calc((($red * 299) + ($green * 587) + ($blue * 114)) / 1000); @if $yiq-debug { @debug $yiq, $threshold; } diff --git a/_sass/minimal-mistakes/_reset.scss b/_sass/minimal-mistakes/_reset.scss index 2259fd0c..97c1733d 100644 --- a/_sass/minimal-mistakes/_reset.scss +++ b/_sass/minimal-mistakes/_reset.scss @@ -10,6 +10,9 @@ html { background-color: $background-color; font-size: 16px; + -webkit-text-size-adjust: 100%; + -ms-text-size-adjust: 100%; + @include breakpoint($medium) { font-size: 18px; } @@ -21,9 +24,6 @@ html { @include breakpoint($x-large) { font-size: 22px; } - - -webkit-text-size-adjust: 100%; - -ms-text-size-adjust: 100%; } /* Remove margin */ diff --git a/_sass/minimal-mistakes/vendor/breakpoint/_helpers.scss b/_sass/minimal-mistakes/vendor/breakpoint/_helpers.scss index 2b7d9b5f..b14a3399 100644 --- a/_sass/minimal-mistakes/vendor/breakpoint/_helpers.scss +++ b/_sass/minimal-mistakes/vendor/breakpoint/_helpers.scss @@ -26,16 +26,16 @@ $unit: unit($value); @if $unit == 'px' { - @return $value / 16px * 1em; + @return calc($value / 16px) * 1em; } @else if $unit == '%' { - @return $value / 100% * 1em; + @return calc($value / 100%) * 1em; } @else if $unit == 'em' { @return $value; } @else if $unit == 'pt' { - @return $value / 12pt * 1em; + @return calc($value / 12pt) * 1em; } @else { @return $value; diff --git a/_sass/minimal-mistakes/vendor/magnific-popup/_settings.scss b/_sass/minimal-mistakes/vendor/magnific-popup/_settings.scss index 8203375a..b389a23c 100644 --- a/_sass/minimal-mistakes/vendor/magnific-popup/_settings.scss +++ b/_sass/minimal-mistakes/vendor/magnific-popup/_settings.scss @@ -29,7 +29,7 @@ $mfp-include-iframe-type: true; // Enable Ifra $mfp-iframe-padding-top: 40px; // Iframe padding top $mfp-iframe-background: #000; // Background color of iframes $mfp-iframe-max-width: 900px; // Maximum width of iframes -$mfp-iframe-ratio: 9/16; // Ratio of iframe (9/16 = widescreen, 3/4 = standard, etc.) +$mfp-iframe-ratio: math.div(9, 16); // Ratio of iframe (9/16 = widescreen, 3/4 = standard, etc.) // Image-type options $mfp-include-image-type: true; // Enable Image-type popups diff --git a/_sass/minimal-mistakes/vendor/susy/susy/_su-math.scss b/_sass/minimal-mistakes/vendor/susy/susy/_su-math.scss index dacb9467..aefd9b76 100644 --- a/_sass/minimal-mistakes/vendor/susy/susy/_su-math.scss +++ b/_sass/minimal-mistakes/vendor/susy/susy/_su-math.scss @@ -90,9 +90,15 @@ $span-width: _su-sum($span, $gutters, $spread, $validate: false); @if unitless($span-width) { + // Ensure $container-spread is a valid spread, likely sanitizing or adjusting its value. $container-spread: su-valid-spread($container-spread); + + // Calculate the container's width based on columns, gutters, and container spread. + // This is a custom function that you need to have defined elsewhere. $container: _su-sum($columns, $gutters, $container-spread, $validate: false); - @return percentage($span-width / $container); + + // Finally, calculate the percentage value of the span-width relative to the container. + @return calc($span-width / $container); } @return $span-width; @@ -143,7 +149,7 @@ } $container: _su-sum($columns, $gutters, $container-spread); - @return percentage($gutters / $container); + @return calc($gutters / $container); } diff --git a/capitalized_keywords.py b/capitalized_keywords.py new file mode 100644 index 00000000..b344a1b0 --- /dev/null +++ b/capitalized_keywords.py @@ -0,0 +1,81 @@ +import os +import yaml +import re + +# Define the folder where the markdown files are stored +folder_path = './_posts' # Change this to your folder path + +# List of stop words to exclude from capitalization +stop_words = {'at', 'vs', 'and', 'or', 'the', 'of', 'in', 'on', 'for', 'to', 'a'} + +# Function to capitalize keywords based on your rules +def capitalize_keywords(keywords): + def capitalize_word(word, first_word=False): + # Only capitalize if it's not a stop word or it's the first word + if word in stop_words and not first_word: + return word + else: + return word.capitalize() + + def process_phrase(phrase): + words = phrase.split() + # Capitalize each word as per rules, first word always capitalized + return ' '.join(capitalize_word(word, i == 0) for i, word in enumerate(words)) + + return [process_phrase(phrase) for phrase in keywords] + +# Function to process each markdown file +def process_markdown_file(file_path): + with open(file_path, 'r', encoding='utf-8') as file: + content = file.read() + + # Use regex to extract the front matter (between '---' lines) + front_matter_match = re.match(r'---(.*?)---', content, re.DOTALL) + if not front_matter_match: + print(f"No front matter found in {file_path}") + return + + front_matter = front_matter_match.group(1) + + # Parse the front matter using YAML + try: + front_matter_dict = yaml.safe_load(front_matter) + except yaml.YAMLError as exc: + print(f"Error parsing YAML in {file_path}: {exc}") + return + + # If 'keywords' exists in front matter, process it + if 'keywords' in front_matter_dict: + original_keywords = front_matter_dict['keywords'] + updated_keywords = capitalize_keywords(original_keywords) + front_matter_dict['keywords'] = updated_keywords + + # Replace the front matter in the content + updated_front_matter = yaml.dump(front_matter_dict, default_flow_style=False) + + # Escape backslashes in YAML to avoid issues with re.sub + updated_front_matter = re.escape(updated_front_matter) + + # Rebuild the full content with updated front matter, use re.sub for replacement + updated_content = re.sub(r'---(.*?)---', f'---\n{updated_front_matter}\n---', content, flags=re.DOTALL) + + # Unescape YAML content for the final write back + updated_content = updated_content.replace(r'\n', '\n').replace(r'\\', '\\') + + # Save the updated content back to the file + with open(file_path, 'w', encoding='utf-8') as file: + file.write(updated_content) + + print(f"Updated keywords in {file_path}") + else: + print(f"No 'keywords' found in {file_path}") + +# Function to process all markdown files in the folder +def process_all_markdown_files(folder_path): + for filename in os.listdir(folder_path): + if filename.endswith(".md"): # Check if it's a markdown file + file_path = os.path.join(folder_path, filename) + process_markdown_file(file_path) + +# Run the function for the specified folder +process_all_markdown_files(folder_path) diff --git a/run_scripts.sh b/run_scripts.sh index 142dff03..21c1799b 100755 --- a/run_scripts.sh +++ b/run_scripts.sh @@ -5,3 +5,4 @@ python fix_frontmatter.py python search_code_snippets.py # python process_markdown_frontmatter.py python rename_files_spaces.py +python markdown_frontmatter_cleanup.py