title

author

date

output

Reproducible Research: Peer Assessment 1

Thomas Frank

25-09-2023

html_document

keep_md
true

Loading and preprocessing the data

rm(list=ls()) # clean workspace
unzip(zipfile="activity.zip")
data <- read.csv("activity.csv")
head(data)

##   steps       date interval
## 1    NA 2012-10-01        0
## 2    NA 2012-10-01        5
## 3    NA 2012-10-01       10
## 4    NA 2012-10-01       15
## 5    NA 2012-10-01       20
## 6    NA 2012-10-01       25

str(data)

## 'data.frame':	17568 obs. of  3 variables:
##  $ steps   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ date    : chr  "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
##  $ interval: int  0 5 10 15 20 25 30 35 40 45 ...

total.steps <-data%>%
        dplyr::filter(!is.na(steps))%>%
        dplyr::group_by(date)%>%
        dplyr::summarise(stepsum =sum(steps))

What is mean total number of steps taken per day?

ggplot(total.steps, aes(x=stepsum)) + 
  geom_histogram(binwidth=1000) + 
        xlab("total number of steps taken each day") + 
        geom_vline(data=total.steps, xintercept=mean(total.steps[["stepsum"]])) + 
        geom_vline(data=total.steps, xintercept=median(total.steps[["stepsum"]]), color="red", linetype="dashed")

The mean number of steps per day is 10766 (black vertical line). The median number of steps per day is 10765 (red dashed vertical line).

What is the average daily activity pattern?

average.d <-data%>%
        dplyr::group_by(interval)%>%
        dplyr::summarise(meansteps =mean(steps, na.rm = TRUE))

ggplot(data=average.d, aes(x=interval, y=meansteps)) +
    geom_line() +
    xlab("5-minute interval") +
    ylab("average number of steps taken")+
    scale_x_continuous(limits=c(0,2400), breaks=c(seq(0,2400,200)))

The 5-minute interval "835" contains the maximum number of steps (206.2) on average across all the days in the dataset.

Imputing missing values

The number of missing values is shown in the following table:

missing <- is.na(data[["steps"]])
kable(table(missing), col.names=c("Missing values", "Frequency (n)"), caption = "Number of missing values in the dataset:")

Table: Number of missing values in the dataset:

Missing values	Frequency (n)
FALSE	15264
TRUE	2304

As substitution rule, all of the missing values are filled in with mean value for that 5-minute interval.

repl.na <- function(steps, interval) {
    replaced <- NA
    if (!is.na(steps))
        replaced <- c(steps)
    else
        replaced <- (average.d[average.d[["interval"]]==interval, "meansteps"])
    return(replaced)
}
repl.d <- data
repl.d[["steps"]] <- mapply(repl.na, repl.d[["steps"]], repl.d[["interval"]])
repl.d<- unnest(repl.d, steps)

total.steps2 <-repl.d%>%
        dplyr::group_by(date)%>%
        dplyr::summarise(stepsum =sum(steps))

ggplot(total.steps2, aes(x=stepsum)) + 
  geom_histogram(binwidth=1000) + 
        xlab("total number of steps taken each day") + 
        geom_vline(data=total.steps2, xintercept=mean(total.steps2[["stepsum"]])) + 
        geom_vline(data=total.steps2, xintercept=median(total.steps2[["stepsum"]]), color="red", linetype="dashed")

The mean number of steps per day is 10766 (black vertical line). The median number of steps per day is 10766 (red dashed vertical line). These values do not differ from the estimates from the first part of the assignment. Thus, the impact of imputing missing data on the estimates of the total daily number of steps is low.

Are there differences in activity patterns between weekdays and weekends?

repl.d2 <- repl.d%>%
        dplyr::mutate(date = ymd(date),
                      date2 = weekdays(date),
                      is.wke = ifelse(date2%in%c("Saturday","Sunday"), "weekend", "weekday"))

average.d2 <-repl.d2%>%
        dplyr::group_by(interval,is.wke)%>%
        dplyr::summarise(meansteps =mean(steps, na.rm = TRUE))

## `summarise()` has grouped output by 'interval'. You can override using the
## `.groups` argument.

ggplot(data=average.d2, aes(x=interval, y=meansteps)) +
    geom_line() +
    xlab("5-minute interval") +
    ylab("average number of steps taken")+
    scale_x_continuous(limits=c(0,2400), breaks=c(seq(0,2400,200)))+
    facet_wrap(~is.wke, ncol = 1, nrow=2)

There is a difference in the activity pattern in that the average number of steps on the weekday is higher in the morning and lower in the afternoon/evening than a weekend day. Weekend days in the graph shown above include Saturday and Sunday.

R environment used for this report {#rsettings}

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: openSUSE Tumbleweed
## 
## Matrix products: default
## BLAS:   /software/content/R/4.2.0_tumbleweed/lib/libRblas.so
## LAPACK: /software/content/R/4.2.0_tumbleweed/lib/libRlapack.so
## 
## Random number generation:
##  RNG:     Inversion 
##  Normal:  Mersenne-Twister 
##  Sample:  Rejection 
##  
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] base      datasets  graphics  grDevices methods   stats     utils    
## 
## other attached packages:
## [1] dplyr_1.1.3     ggplot2_3.4.2   knitr_1.43      lubridate_1.9.2
## [5] tidyr_1.3.0    
## 
## loaded via a namespace (and not attached):
##  [1] bslib_0.5.0       cachem_1.0.8      cli_3.6.1         colorspace_2.1-0 
##  [5] compiler_4.2.0    digest_0.6.33     evaluate_0.21     fansi_1.0.4      
##  [9] farver_2.1.1      fastmap_1.1.1     generics_0.1.3    glue_1.6.2       
## [13] grid_4.2.0        gtable_0.3.3      highr_0.10        htmltools_0.5.6  
## [17] jquerylib_0.1.4   jsonlite_1.8.7    labeling_0.4.2    lifecycle_1.0.3  
## [21] magrittr_2.0.3    munsell_0.5.0     pillar_1.9.0      pkgconfig_2.0.3  
## [25] purrr_1.0.2       R6_2.5.1          rlang_1.1.1       rmarkdown_2.23   
## [29] rstudioapi_0.15.0 sass_0.4.7        scales_1.2.1      tibble_3.2.1     
## [33] tidyselect_1.2.0  timechange_0.2.0  tools_4.2.0       utf8_1.2.3       
## [37] vctrs_0.6.3       withr_2.5.0       xfun_0.40         yaml_2.3.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PA1_template.md

PA1_template.md

Loading and preprocessing the data

What is mean total number of steps taken per day?

What is the average daily activity pattern?

Imputing missing values

Are there differences in activity patterns between weekdays and weekends?

R environment used for this report {#rsettings}

Files

PA1_template.md

Latest commit

History

PA1_template.md

File metadata and controls

Loading and preprocessing the data

What is mean total number of steps taken per day?

What is the average daily activity pattern?

Imputing missing values

Are there differences in activity patterns between weekdays and weekends?

R environment used for this report {#rsettings}