Skip to content

Assignment from course "Principel of DataScience": Performing the data cleaniing and EDA.

zonaylc/student_mat_EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

student_mat_EDA

Assignment from course "Statistical Principel of DataScience": Performing the data cleaniing and EDA.

Tasks

  1. Data Cleaning
  • To understand how our data looks like, extract the first five lines of data.
  • Check if there are typos and missing data. If so, correct the typos, list and delete the rows contain missing data.
  • Save the cleaned data as a .csv file.
  1. Exploratory Data Analysis In this section, please describe your findings for each question base on the plots below. All plots need to have appropriate labels, titles and coordinates. You can also annotate the plots where needed. The ggplot2 is recommended if you are using R.
  • Generate violin plots for all three periods of grades (in one plot).
  • Plot a bar chart for the 'address','traveltime' and `G3' variables.
  • Creat a new variable 'Gmean', which is the mean of 'G1', 'G2','G3'. Plot the densities of 'Gmean', separately according to the variable `school' (display the density curves in the same figure).
  • Plot the scatter plot of 'G1' and 'G2', 'G1' and 'G3', 'G2' and 'G3'.
  1. More Analysis Suppose you are invited to provide suggestions that can improve students' perfor- mance using the dataset: 'student-mat.csv'. Which variables you think is important that affect students' final grades? Please illustrate the reason you choose these vari- ables and provide EDA when it is necessary. And provide some advice to improve students' performance.

About

Assignment from course "Principel of DataScience": Performing the data cleaniing and EDA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages