run_analysis/README.md at master · chunzhu/run_analysis

Things you need to do before running the code

install dplyr package
ensure run_analysis.R is in your current working directory, or else set your current working directory to the folder run_analysis.R resides
connected to internet

============ how my run_analysis.R works

load dplyr package
check if data folder exists a. if data folder does not exist b. download from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip c. unzip the folder
load train data set (subject_train, X_train, y_train) and combine all the train data set into 1 data frame
load test data set (subject_test, X_test, y_test) and combine all the test data set into 1 data frame
Merge the training and the test sets to create one complete data set
load the feature descriptions and label subject column and activity column into the complete data set.
extract all variables which labels contain mean(), std(), "Training.Label.ID" or "Subject.ID"
clean up some variables that do not belongs to this experiment( assume repeated words are a result of error)
load the list of descriptive activity and describe the Activity numbers in the extracted data set
Search for Abbreviations and subsitute with the full word to make the variable name meaningful
Group the People and Activity using factor level
Average all the variables based on the groups generated in the previous step
Generate a file based on the group-based average results

Provide feedback