Skip to content

jaw039/R-Data-Analysis-PDFs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Analysis-PDF

This repository contains coursework for DSC190, showcasing five distinct data analysis projects using R. Each project demonstrates a range of data analysis techniques, statistical methods, and visualization skills.

Projects Overview

  1. Analysing the Programming Language Preferences of Data Scientists in 2020

    • Explores survey data from Kaggle to analyze programming language usage and educational backgrounds.
    • Techniques: binary encoding, chi-square tests, clustering, random forest classification.
  2. Detecting Replication Patterns in HCMV through Palindrome Analysis

    • Investigates palindrome patterns in genomic data using simulation and statistical tests.
    • Techniques: Monte Carlo simulation, Q-Q plots, KS tests, cluster analysis.
  3. Maternal Smoking and Birth Weight

    • Examines the impact of maternal smoking on birth weight using statistical comparisons.
    • Techniques: data cleaning, permutation tests, kurtosis analysis, group comparisons.
  4. Statistical Analysis and Calibration of a Gamma Transmission Gauge

    • Analyzes calibration data for a gamma transmission gauge with regression and prediction methods.
    • Techniques: linear/log-linear regression, prediction intervals, reverse prediction, robustness testing.
  5. Video Gaming Patterns and Academic Performance

    • Studies the relationship between video gaming habits and academic performance.
    • Techniques: bootstrapping, confidence intervals, demographic analysis, hypothesis testing.

Skills Demonstrated

  • Data Cleaning: Handling missing values, recoding, filtering, and outlier removal.
  • Exploratory Data Analysis: Visualizations like histograms, barplots, and heatmaps.
  • Statistical Testing: Chi-square tests, KS tests, permutation tests, and bootstrap methods.
  • Modeling & Prediction: Regression analysis, clustering, and random forest classification.
  • Simulation: Monte Carlo methods and resampling techniques.
  • Reproducible Reporting: Scripts and tables generated with R and knitr.

Tools & Libraries

  • R: Base R and packages like dplyr, caret, randomForest, corrplot, and knitr.
  • Data Sources: Includes survey data, genomic data, and other text/delimited files.

Feel free to explore the individual project folders for detailed scripts and reports. Each project folder contains the R script and a corresponding PDF report summarizing the analysis.

About

DSC190 Course Work

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages