This repository contains a comprehensive collection of my master's level coursework in computational statistics, showcasing both tutorial implementations and original work. The code demonstrates proficiency in statistical computing, data analysis, machine learning, and SQL-based data engineering.
The following projects represent my original work completed during my master's program:
- Weather Data Analysis: Interactive visualization of weather data comparing temperature differences between cities using R's leaflet and ggplot2 packages.
- Flight Delay Prediction: Analysis of flight data to predict arrival delays using kNN classification and regression trees.
- Insurance Cost Modeling: Comprehensive regression analysis of health insurance costs using various transformations and model selection techniques.
- Statistical Methods Implementation: Original implementation of statistical methods for data analysis.
- Time Series Analysis: Time series modeling and forecasting techniques.
- Multivariate Analysis: Implementation of multivariate statistical methods.
- Matrix Operations & Eigenstructure Analysis: Implementation of matrix multiplication visualizations with eigendecomposition, demonstrating how eigenvalues and eigenvectors explain the geometric effects of matrix transformations.
- K-means Clustering: Custom implementation of the k-means clustering algorithm with visualization of each step in the clustering process, showing centroid updates and cluster assignments.
- Numerical Methods: Implementations of numerical integration, optimization techniques, and statistical simulations.
- Naive Bayes Classification: Implementation of the Naive Bayes algorithm for classification tasks with detailed documentation.
- K-Nearest Neighbors: Custom KNN implementation with cross-validation and performance metrics.
- Regression Analysis: Multiple regression techniques with diagnostic tools and visualization.
- Monte Carlo Simulations: Various Monte Carlo methods for statistical inference and probability estimation.
- Bootstrap Methods: Implementation of bootstrap resampling for confidence interval estimation.
- Convolution and Distribution Functions: Custom implementations of probability distributions and convolution operations.
- Complex SQL Queries: Enterprise-level SQL queries for data extraction and transformation, including:
- Campaign analytics and attribution modeling
- Web analytics data processing
- Data Transformation: SQL scripts demonstrating ETL processes and data warehouse design principles.
- R: Advanced statistical computing, data visualization, and machine learning
- Original implementations in Project1.R, Final.R, and Townsend_Midterm.R
- SAS: Statistical analysis and data management
- Applied in final.sas and SAS Advanced Programs
- SQL: Complex data querying and database management
- Enterprise-level implementations for data analytics
- Python: Data processing and machine learning implementations
- Hypothesis testing
- Regression analysis
- Time series analysis
- Clustering and classification
- Dimensionality reduction
- Probability theory applications
- Linear algebra implementations
- Numerical optimization
- Eigenvalue decomposition
- Matrix operations
- Probability distributions
This repository serves as both a learning resource and a demonstration of applied statistical computing skills. The code includes:
- Detailed comments explaining statistical concepts (see Matrix Multiplication Example)
- Visualizations of complex algorithms (see K-means Visualization)
- Step-by-step implementations of statistical methods (see Bootstrap Methods)
- Real-world applications of theoretical concepts
The techniques demonstrated in this repository have direct applications in:
- Data Science and Analytics: Classification, Regression
- Business Intelligence: SQL Data Transformation, Campaign Analysis
- Predictive Modeling: Time Series, Multivariate Analysis
- Statistical Research: Numerical Methods, Probability Distributions
- Machine Learning Engineering: K-means, Naive Bayes
- Data-Driven Decision Making: Business Analytics, Attribution Modeling
Note: This repository contains both guided coursework implementations and original work completed as part of a master's program in computational statistics. The code is intended to showcase technical proficiency and understanding of statistical computing principles for potential employers and collaborators.