forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
101 lines (87 loc) · 3.11 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r, message=FALSE}
library(dplyr)
library(scales)
library(ggplot2)
act_dat <- read.csv("./activity.csv")
names(act_dat) <- c('Steps', 'Date', 'Interval')
```
## What is mean total number of steps taken per day?
```{r}
sum_steps_d <- aggregate(act_dat['Steps'],
list(Date = act_dat$Date),
sum, na.rm=TRUE)
sum_steps_d$Date <- as.Date(sum_steps_d$Date)
g <- ggplot(sum_steps_d, aes(Date, Steps))
g <- g + geom_bar(stat="identity")
g <- g + scale_x_date()
g <- g + labs(y='Total Number of Steps')
print(g)
mean_step <- mean(sum_steps_d$Steps)
median_step <- median(sum_steps_d$Steps)
```
The mean and median total number of step per day is `r mean_step` and `r median_step` respectively.
## What is the average daily activity pattern?
```{r}
avg_steps_i <- aggregate(act_dat['Steps'],
list(Interval = act_dat$Interval),
mean, na.rm=TRUE)
g <- ggplot(avg_steps_i, aes(Interval, Steps))
g <- g + geom_line()
g <- g + labs(x='Interval (5 minutes)', y='Average Number of Steps')
print(g)
max_steps <- avg_steps_i$Interval[which.max(avg_steps_i$Steps)]
```
The maximum average number of steps occured at the `r max_steps` minute interval.
## Imputing missing values
```{r}
na_cnt <- sum(is.na(act_dat$Steps))
```
The total number of missing values in the dataset is `r na_cnt`.
```{r}
act_dat_c <- act_dat
for (i in avg_steps_i$Interval){
act_dat_c$Steps[act_dat_c$Interval == i & is.na(act_dat_c$Steps)] <-
as.integer(round(avg_steps_i$Steps[avg_steps_i$Interval == i], 0))
}
```
Substitute missing values with the value averaged across days.
```{r}
sum_steps_d <- aggregate(act_dat_c['Steps'],
list(Date = act_dat_c$Date),
sum, na.rm=TRUE)
sum_steps_d$Date <- as.Date(sum_steps_d$Date)
g <- ggplot(sum_steps_d, aes(Date, Steps))
g <- g + geom_bar(stat="identity")
g <- g + scale_x_date()
g <- g + labs(y="Total Number of Steps")
print(g)
```
```{r}
mean_step_c <- mean(sum_steps_d$Steps)
median_step_c <- median(sum_steps_d$Steps)
dif_mean_step <- mean_step_c - mean_step
dif_median_step <- median_step_c - median_step
```
The mean and median total number of steps per day is `r mean_step_c` and `r median_step_c` respectively. Yes, there is a difference. The difference in mean and median total number of steps per day is `r dif_mean_step` and `r dif_median_step` respectively.
## Are there differences in activity patterns between weekdays and weekends?
```{r}
avg_weekend <- mutate(act_dat_c,
Date = as.Date(Date),
Weekend =
ifelse(weekdays(Date) %in% c('Saturday', 'Sunday'),
'Weekend', 'Weekday')) %>%
group_by(Interval, Weekend) %>%
summarize(Steps = mean(Steps))
g <- ggplot(avg_weekend, aes(Interval, Steps))
g <- g + facet_grid(Weekend ~ ., scales='free_y')
g <- g + geom_line()
g <- g + labs(x='Interval (5 minutes)', y='Average Number of Steps')
print(g)
```