-
Notifications
You must be signed in to change notification settings - Fork 37.1k
/
Copy pathPA1_template.Rmd
110 lines (82 loc) · 2.96 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
In this data section I read the data from the shared folder.
```{r, echo=FALSE}
library(tidyverse) #Need for the Pipes I will use in next step.
library(knitr)
options(digits=0) #Shows full numbers
options(scipen=999) #Removes scientific notation
```
```{r, echo=TRUE}
activity <- read.csv("~/Reproducible Research/week2/activity.csv")
```
### Histogram of number of steps
Here, I do an histogram of the number of steps taken each day.
```{r, echo=TRUE}
data<- activity %>%
group_by(date) %>%
summarize(Steps_per_day=sum(steps))
hist(data$Steps_per_day)
```
# What is the mean and median number of steps taken each day?
As instructed, I use NA.RM to ignore the missings.
```{r ,echo=TRUE}
mean<-mean(data$Steps_per_day,na.rm=TRUE)
median<-median(data$Steps_per_day,na.rm=TRUE)
```
The mean number of steps is `r mean` and the median is `r median`.
# What is the average daily activity pattern?
To study this, I will make a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis). For this, I aggregate the data per interval.
```{r , echo=TRUE}
data2<- activity %>%
group_by(interval) %>%
summarize(Steps_per_interval=mean(steps,na.rm=TRUE))
plot(data2$interval,data2$Steps_per_interval,type="l")
max<-max(data2$Steps_per_interval)
max2<-subset(data2,(data2$Steps_per_interval)==max)
```
Interval `r max2$interval` is the one with the maximum or higher number of steps. The total number of steps is `r max2$Steps_per_interval`.
#Imputing missing values
```{r determine missing, echo=TRUE}
activity$missing<-ifelse(is.na(activity$steps)==TRUE,1,0)
data3<-activity %>%
summarize(Missing=sum(missing))
total_missing<-data3$Missing
rm(data3)
```
The total number of observation with a missing value is `r total_missing`.
```{r , echo=TRUE}
#Here I create a new dataset to avoid overwriting the old one
activity2<-activity
mean_steps<-mean(activity2$steps,na.rm=TRUE)
activity2$steps_imp<-ifelse(is.na(activity2$steps)==FALSE,activity2$steps,mean_steps)
```
Now, I create the new dataset per day, but using the imputed data.
```{r, echo=TRUE}
data_imp<- activity2 %>%
group_by(date) %>%
summarize(Steps_per_day_imp=sum(steps_imp))
hist(data_imp$Steps_per_day_imp)
```
## Are there differences in activity patterns between weekdays and weekends?
```{r, echo=TRUE}
activity3<-activity2
activity3$date_new <- as.Date(activity3$date)
activity3$day<-weekdays(activity3$date_new)
activity3$weekend<-ifelse(activity3$day=="Saturday","Weekend",
ifelse(activity3$day=="Sunday","Weekend","Weekday"))
```
```{r, echo=TRUE}
library(lattice)
weekday_series<- activity3 %>%
group_by(weekend,interval) %>%
summarize(Steps_per_int_imp=mean(steps_imp))
xyplot(Steps_per_int_imp ~ interval | weekend,
data = weekday_series,
type = "l")
```