This repository contains coursework for DSC190, showcasing five distinct data analysis projects using R. Each project demonstrates a range of data analysis techniques, statistical methods, and visualization skills.
-
Analysing the Programming Language Preferences of Data Scientists in 2020
- Explores survey data from Kaggle to analyze programming language usage and educational backgrounds.
- Techniques: binary encoding, chi-square tests, clustering, random forest classification.
-
Detecting Replication Patterns in HCMV through Palindrome Analysis
- Investigates palindrome patterns in genomic data using simulation and statistical tests.
- Techniques: Monte Carlo simulation, Q-Q plots, KS tests, cluster analysis.
-
Maternal Smoking and Birth Weight
- Examines the impact of maternal smoking on birth weight using statistical comparisons.
- Techniques: data cleaning, permutation tests, kurtosis analysis, group comparisons.
-
Statistical Analysis and Calibration of a Gamma Transmission Gauge
- Analyzes calibration data for a gamma transmission gauge with regression and prediction methods.
- Techniques: linear/log-linear regression, prediction intervals, reverse prediction, robustness testing.
-
Video Gaming Patterns and Academic Performance
- Studies the relationship between video gaming habits and academic performance.
- Techniques: bootstrapping, confidence intervals, demographic analysis, hypothesis testing.
- Data Cleaning: Handling missing values, recoding, filtering, and outlier removal.
- Exploratory Data Analysis: Visualizations like histograms, barplots, and heatmaps.
- Statistical Testing: Chi-square tests, KS tests, permutation tests, and bootstrap methods.
- Modeling & Prediction: Regression analysis, clustering, and random forest classification.
- Simulation: Monte Carlo methods and resampling techniques.
- Reproducible Reporting: Scripts and tables generated with R and knitr.
- R: Base R and packages like
dplyr,caret,randomForest,corrplot, andknitr. - Data Sources: Includes survey data, genomic data, and other text/delimited files.
Feel free to explore the individual project folders for detailed scripts and reports. Each project folder contains the R script and a corresponding PDF report summarizing the analysis.