RProgrammingForResearch/homework.Rmd at master · CSU-R/RProgrammingForResearch · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
# Appendix B: Homework

This section provides the homework assignments for the course.

## Homework #1

**Due date: Sept. 13 by 5:00 pm**

For this assignment, you will submit the assignment to me **by email** by the
due date. You should include one file in your submission:

1. A Word document with seven paragraphs. Each paragraph should be
headed with the name of one swirl lesson and the body of the paragraph should
describe that lesson and what you learned from it.

For this homework assignment, you'll be working through a few
[swirl](http://swirlstats.com/) lessons that are relevant to the material we've
covered so far. Swirl is a platform that helps you learn R **in** R---you can
complete the lessons right in your R console.

Depending on your familiarity with R, you can either work through seven lessons
of your choice in the `R Programming: The basics of programming in R` and
`Getting and Cleaning Data` courses (suggested lessons are listed further below)
(**Option #1**), or you can work through seven lessons of your choice taken from
any number of swirl's available courses  (**Option #2**) .

For each lesson completed, please write a few sentences that cover: 1. A summary
of the topic(s) covered in that lesson, and 2. The most interesting thing that
you learned from that lesson. Turn in a hardcopy of this (with your first and
last name at the top) during class on the due date.

To begin, you'll first need to install the swirl package:

```{r eval=FALSE}
install.packages("swirl")
```

If you've never run `swirl()` before, you will be prompted to install a course.
You can do that with the `install_course` function. For example, to install the
`R Programming` course, you would run:

```{r eval = FALSE}
library(swirl)
install_course("R Programming")
```

Once you've installed a course, every time you enter the swirl environment with
`swirl()`, `R Progamming` should show up as a course option to select. You can
enter `R Programming` to start lessons in that course by typing the number in
fromt of it when you run `swirl()`.

Once you have at least one course installed, you call the `swirl()` function to
enter the interactive platform in RStudio. The console will take you through a
few prompts: you'll give swirl a name to call you, and take a look at some
commands that are useful in the swirl environment. Those commands are listed
further below.

```{r eval = FALSE}
library(swirl)
swirl()
```

```{block, type = 'rmdnote'}
After calling `swirl()`, you may be prompted to clear your workspace variables by running `rm=(list=ls())`. Running this code will clear any variables you already have saved in your global environment. While swirl recommends that you do this, it's not necessary.
```

Some of these lessons complement online courses through Coursera, so sometimes
you will be asked after you complete a lesson if you want to report your results
to Coursera. You should select "No" for that option each time.

### Option 1

For **Option 1** of this homework, you will need to work through seven of the 15
available lessons in the `R Programming` course. Here are some suggestions for
particularly uesful lessons that you could choose (the lesson number within the
course is in parentheses):

**R Programming** course:

- Basic Building Blocks (1)
- Sequences of Numbers (3)
- Vectors (4)
- Missing Values (5)
- Subsetting Vectors (6)
- Logic (8)
- Looking at Data (12)
- Dates and Times (14)

**Getting and Cleaning Data** course:

- Manipulating Data with dplyr (1)
- Grouping and Chaining with dplyr (2)
- Dates and Times with lubridate (4)

Each lesson should take at most 10--15 minutes, but some are much shorter. You
can complete the lessons in any order you want, but you may find it easiest to
start with the lowest-numbered lessons and work your way up, in the order we've
listed the lessons here.

You'll be able to get started on some of these lessons after your first day in
class ("Basic Building Blocks"", for example), but others cover topics that
we'll get to in weeks 2 and 3. Whether or not we've covered a swirl topic in
class, you should be able to successfully work through the lesson. At the end of
each lesson, you may be asked if you would like to receive credit for completing
this course on Coursera.org. Always choose "no" for this option.

Again, you'll need to compose and turn in a few sentences for each lesson. Make
sure to include a summary of what each lesson was about, and the most
interesting thing about that lesson.

### Option 2

If you're already somewhat familiar with R, you might want to choose your seven
lessons from other swirl courses instead of or in addition to those available in
the `R Programming` and `Getting and Cleaning Data` courses.

Check out the [list of available Swirl
Courses](http://swirlstats.com/scn/title.html) to see which ones you would like
to install and check out available lessons for. For example, to choose a lesson
in the `Exploratory Data Analysis` course, you would run:

```{r eval = FALSE}
library(swirl)
install_course("Exploratory Data Analysis")
swirl()
```

After entering the `Exploratory Data Analysis` course, you could choose from any
one of its available lessons.

In your written summary for each lesson (again, a few sentences that cover a
summary of the lesson and the most interesting thing you learned), make sure to
specify which course each lesson you completed was from.

### Special swirl commands

In the swirl environment, knowing about the following commands will be helpful:

- Within each lesson, the prompt `...` indicates that you should hit Enter to
move on to the next section.
- `skip()`: skip the current question.
- `play()`: temporarily exit swirl. It can be useful during a swirl lesson to
play around in the R console to try things out.
- `nxt()`: regain swirl's attention after `play()`ing around in the console.
- `main()`: return to swirl's main menu.
- `bye()`: exit swirl. Swirl will save your progress if you exit in the middle
of a lesson. You can also hit the Esc. key to exit. (To re-enter swirl, run
`swirl()`. In a new R session you will have to first load the swirl library:
`library(swirl)`.)

#### For fun

While they aren't required for class, you should consider trying out some other
swirl lessons later in the course. You can look through the [course
directory](http://swirlstats.com/scn/title.html) to see what other courses and
lessons are available. For the first part of our course, you might find the
"Exploratory Data Analysis" course helpful. If you would like to learn more
about using R for statistical analysis, you might find the "Regression Models"
course helpful.

## Homework #2

**Due date: Sept. 27 by 5:00 p.m.**

For this assignment, you will submit the assignment to me **by email** by the
due date. You should include three files in your submission:

1. The final rendered Word document (rendered from an RMarkdown file).
2. The original RMarkdown file used to create that final document.
3. The dataset you used for the assignment.

For this assignment, start by picking a dataset either from your own research or
something interesting available online (if you're struggling to find something,
check out [Five Thirty Eight's GitHub data
repository](https://github.com/fivethirtyeight/data)). You will then use this
dataset to practice what you've learned with R so far. This will also be a
chance, if you're using a dataset from your own research for me to get an idea
of how you might be planning to use R in future research.

**Very important note**: Some research datasets have privacy constraints. This
includes any datasets collected from human subjects, but can also include other
datasets. For some projects, the principal investigator may prefer to keep the
data private until publication of results. Before using a dataset for this
project, please confirm with your research advisor that there are no constraints
on the data. I do not plan to make the results of this assignment public, but I
also do not want to us to be emailing back and forth a dataset with any
constraints.

Using your dataset, create an RMarkdown document with the content listed below.
**Be sure to set the `echo` option to `TRUE` so all of your code will also
print out to the Word document.** Include in the RMarkdown document: (1) your
full name; (2) the due date of the assignment; and (3) a title that includes
"Homework 2". These should all be set in RMarkdown, **not** changed in the
Word output.
Each section I've listed below should be included in the RMarkdown document as
its own section, with a section header created using Markdown code.

- **Section 1: Description of the data**: Describe the dataset you are using,
both in terms of the **content** (what is this data measuring? how was it
collected? what kinds of research questions are you hoping to use it to answer?)
and in terms of its **format** (what type of file is it saved in? what if it is
in a flat file, is it fixed width or delimited? if it is delimited, what is the
delimiter? if it is binary, what is the program that would normally be used to
open it?). (20 points)
- **Section 2: Reading the data into R**: Include code that reads the data into
R and assigns it to a dataframe object that you can use later in the document.
Explain in the text which R function you used to read in the data (e.g.,
`read_csv`) and which package it came from (if it was not a base R function). If
there were any special options you needed to use (e.g., `skip` to skip some rows
without data), list those and explain why you used them. Next, include some code
to clean the data (e.g., rename columns, convert any dates into a "Date"
format). You can filter to certain rows if you would like, but do **not** filter
out missing values, as we'll want to learn more about those later. (20 points)
- **Section 3: Characteristics of the data**: Describe the dataframe you just
read in. How many rows does it have? How many columns? (Use inline code in the
RMarkdown document to put this information into a sentence that reads "This
dataframe has ... rows and ... columns.".) What are the names of the columns?
What does each row measure (i.e., what is the unit of observation)? Include a
table (using Markdown directly or `kable`) explaining what each column measures.
This table should have three columns: (1) the column name in the R dataframe;
(2) a very brief description of what each column measures; and (3) the units (if
any) of the measurement in the column. (20 points)
- **Section 4: Summary statistics**: Pick three columns of the dataframe. Use
the `summarize` function to get the following summaries of these columns: (1)
minimum value; (2) maximum value; (3) mean value; (4) number of missing values.
If there are missing values, make sure you use the appropriate options in
summarizing these values to exclude those when calculating the minimum, maximum,
and mean. Assign the result of this `summarize` call to a new R object, and
print it out, so these summaries show up in your final, rendered Word document.
(20 points)
- **Section 5: Visualizing the data**: Create two plots of your dataframe. One
should use a "statistical" geom (e.g., histogram, bar chart, boxplot) and one a
"non-statistical" geom (e.g., scatterplot, line plot for time series). Explain
why these plots help you learn more about this data and about the interesting
research questions you're hoping to explore with the data. Be sure to customize
the final size of each plot in the Word document using the `fig.width` and
`fig.height` commands. For each plot, also be sure to customize the x- and
y-axis labels. Finally, explain how each plot is following at least two of the
principles of "good graphics" covered in week 4 of the course (Chapter 4 of the
book)---if necessary, use `ggplot` functions and options to make the plots
comply with some of these principles. (20 points)

<!-- For Homework 2, recreate the R Markdown document that you can download from [here](https://github.com/geanders/RProgrammingForResearch/raw/master/Homework/Homework_2.docx).  -->

<!-- Here are some initial instructions and tips:  -->

<!-- - Your goal is to create an R Markdown document that you can compile to create a Word document that looks just like the example document we've linked above.  -->
<!-- - You will turn in (by email) both the compiled Word document and the .Rmd original file. -->
<!-- - Add your name as "Author" and the due date of the assignment as "Date". You should add these within the R Markdown document, rather than changing them in the final, compiled Word document. -->
<!-- - If you want to get started before you know how to use R Markdown, you can go ahead and write all of the necessary code to replicate the output and figures in the document in an R script.  -->
<!-- - The code chunks here have been hidden with the option `echo = FALSE`, but you should include your code in your final document.  -->
<!-- - Set the chunk options `warning = FALSE` and `message = FALSE` to prevent warnings and messages from being printed out. You will get some messages and warnings in the code from things like missing values and from loading packages, but you want to hide all of those messages in your final document. -->
<!-- - For things like templates, colors, level of transparency, and point size, you will receive full credit if you create figures that are visually similar to the ones shown in the example document. In other words, if the example document shows some transparency in points, you will get full credit if you also include some transparency in the points in your plot, but you do not have to include the exact same value of `alpha`.  -->
<!-- - Pay close attention to the typeface used in text of the original document. If something is shown in boldface or "typewriter" fontface in the original, make sure you've written your Rmarkdown code to have the same type. -->
<!-- - In R, there are often many different ways to achieve something. As long as your code *works*, it's fine if you haven't coded it exactly like we have in our version. However, your output should look identical to ours (or, in the case of color, transparency, point size, and themes, visually similar). -->
<!-- - You will not lose points if you cannot recreate the table in the document (although you should try to!). -->
<!-- - The last section, under the heading "Extra challenge-- not graded", is not graded. However, if you'd like an extra challenge, you're welcome to try it out and include it in your final submission! -->

<!-- If you need them, here are some further tips:  -->

<!-- - You will learn Rmarkdown in Week 5 of the course. However, if you want to get started on this exercise before you learn how to use Rmarkdown, you can start by working on the regular R code to read in the data and create the figures shown in the document. -->
<!-- - Functions from the tidyverse (especially from `dplyr`, `readr`, and `ggplot` packages) will make your life much easier for this exercise. You can now install and load the `tidyverse` package to load them all at once.  -->
<!-- - To rename column names with "special" characters in them, wrap the whole old column name in backticks. For example, to change a column name that has a dollar sign in it, you would use something like "``rename(new_col_name = `old_col_name$` ``)".  -->
<!-- - To change the size of a figure in a report, use the "fig.width" and "fig.height" chunk options.  -->
<!-- - You will want to use `scale_fill_brewer` in several of the figures. -->
<!-- - Don't forget that, within functions like `scale_x_continuous`, you can use the argument `breaks` to set where the axis has breaks, and then `labels` to set what will actually be shown at each break.  -->
<!-- - The string "\\n" can be included in legends and labels to include a carriage return. -->
<!-- - Coordinates can be flipped in a graph with the `coord_flip` geom. So, if you can figure out a way to make a graph with the coordinates flipped, use that code and just add `coord_flip` at the end. -->

## Homework #3

**Due date: Oct. 18**

This homework includes several parts. Make sure you complete each part for full credit
(100 points).

1. Complete your homework in an RMarkdown document and knit it to either Word or PDF
to submit. Change your global options so that no warnings, messages, or errors are
printed out, but that so all code and results are printed. (10 points)
2. Select one of the figures you created for the last homework. You will be improving
that figure in this part of the homework.
    + Go through the six guidelines for good graphics we discussed in Chapter 6. Re-create
    the same plot you created in the last homework and, in a paragraph or two, list which
    of these elements you currently have in the plot and how these elements help make
    the plot more effective. (10 points)
    + Select three of these guidelines for which you either aren't using them in the
    current graph or could add to what you currently have in the plot. Create a new
    version of the plot based on your choice. Include a paragraph explaining how these
    changes make the plot more effective and how you added them. (20 points)
3. In the `titanic` package (available on CRAN), there's a `titanic_train` dataset
we'll use for this question.
    + Load that dataset and create a dataframe object limited to the columns
    `Survived` and `Age`. Change the labeling for the `Survived` column so that it
    says "Survived" instead of "1" and "Died" instead of "0".
    Print out the oldest five people who survived and who died (so, you'll be printing
    out 10 rows total). (10 points)
    + Create a histogram of the ages of people in this dataset, faceted by whether or
    not they survived. Arrange these so that they are vertically aligned rather than
    side by side (Hint: check the `nrow` option when faceting). Change the label on
    the y-axis to be something that more precisely describes what's being shown
    (rather than "count") and add a title to the plot. (10 points)
    + Determine the mean age for each group (survived and died) for everyone who
    has a non-missing age, as well as the total number of people in the group
    and the number of people with age information missing for the group. Print out a
    two-row dataframe with this summary information (it should have columns
    for (1) which group (survived / died); (2) the mean age of people in the group with
    non-missing age data; (3) the total number of people in the group; and (4) the number
    of people in the group with missing age data). (15 points)
    + Run a statistical test of whether there is a difference in
    the mean ages between the two groups (Hint: you might want to google something
    like "test difference
    two means"). You can run this test either using a simple statistical test or by
    fitting a linear regression---you may take either approach. (15 points)
4. There's a radio show on National Public Radio called "Weekend Edition", and
each week they have a "Sunday Puzzle". The puzzle for the week ending August 5,
2018 was this:  "I said think of a familiar two-word phrase in eight letters
with four letters in each word. The first word starts with M. And I said move
the first letter of the second word to the end, and you get a regular
eight-letter word, which, amazingly, other than the M, doesn't share any sounds
with the original two-word phrase. What phrase is it?". The winner had this to
say about how he solved the problem: "I went to the National Puzzlers' League
website and pulled up a list of eight-letter words that start with the letter M
and just started scanning through it...'. You can solve this puzzle using R
(some of the `stringr` functions, like `str_count` and `str_sub`, will be particularly
helpful). Write R code to solve this puzzle. (10 points)
    + Hint 1: Work backwords. Start with eight-letter words, and apply a backwards
    transform to the one described to transform them into two four-letter words.
    + Hint 2: This link might prove very useful: https://github.com/dwyl/english-words.

<!-- <!-- For Homework 3, recreate the R Markdown document that you can download from [here](https://github.com/geanders/RProgrammingForResearch/raw/master/Homework/Homework_3.docx).  -->

<!-- <!-- Here are some initial tips:  -->

<!-- <!-- - Your goal is to create an R Markdown document that you can compile to create a Word document that looks just like the target document we've linked above. The only difference is that you will use `echo = TRUE` to show your code within the rendered Word document. All formating within the text should be similar or identical to the target document.   -->
<!-- <!-- - You will turn in (by email) both the compiled Word document and the .Rmd original file. -->
<!-- <!-- - Add your name as "Author" and the due date of the assignment as "Date". You should add these within the R Markdown document, rather than changing them in the final, compiled Word document. -->
<!-- <!-- - Set the chunk options `warning = FALSE` and `message = FALSE` to prevent warnings and messages from being printed out. You will get some messages and warnings in the code from things like missing values and from loading packages, but you want to hide all of those messages in your final document. -->
<!-- <!-- - For things like templates, colors, level of transparency, and point size, you will receive full credit if you create figures that are visually similar to the ones shown in the example document. In other words, if the example document shows some transparency in points, you will get full credit if you also include some transparency in the points in your plot, but you do not have to include the exact same value of `alpha`.  -->
<!-- <!-- - In R, there are often many different ways to achieve something. As long as your code *works*, it's fine if you haven't coded it exactly like we have in our version. However, your output should look identical to ours (or, in the case of color, transparency, point size, and themes, visually similar). -->
<!-- <!-- - There is one formated table in the target document. Be sure that you render this as a formated table, not as raw R output. -->
<!-- <!-- - You will be graded on whether the size of each figure is similar to that in the example file. There is a tip in the "Further tips" section below about how to change figure size in the output. -->

<!-- <!-- If you need them, here are some further tips:  -->

<!-- <!-- - Functions from the tidyverse (especially from `dplyr`, `readr`, and `ggplot` packages) will make your life much easier for this exercise.  -->
<!-- <!-- - To reference column names with "special" characters in them, like dollar signs or spaces, wrap the whole old column name in backticks. For example, to change a column name that has a dollar sign in it, you would use something like "rename(new_col_name = \`old_col_name$\`)".  -->
<!-- <!-- - To change the size of a figure in a report, use the "fig.width" and "fig.height" chunk options.  -->

## Homework #4

**Due date: Nov. 1**

For this homework, you will use a data set collected by the *Washington Post* on homicides in 50 large U.S. cities. The data is available through their GitHub repository: https://github.com/washingtonpost/data-homicides. You can read their accompanying article at https://www.washingtonpost.com/graphics/2018/investigations/where-murders-go-unsolved/.

Submit your homework by email. You should include the Rmarkdown file (.Rmd) and
the output Word document.

<!-- ### Setting up a GitHub repository for this project -->

<!-- You will submit this project by sending me the link to a GitHub repository. For the other questions in the -->
<!-- homework, you'll need to write an RMarkdown document and render it to a pdf file. -->

<!-- Take the following steps to set up your GitHub repository for the homework (we will work on this as an -->
<!-- in-course exercise): -->

<!-- 1. Set up a local (i.e., on your computer) R project with subdirectories for the data, writing (this -->
<!-- is where you'll put your Rmarkdown file and the output file), and figures. -->
<!-- 2. Download the Washington Post data and save it in an appropriate place in this R project directory. -->
<!-- 3. Initialize git for the R project and make your initial commit. -->
<!-- 4. Create an RMarkdown file for your homework answers in the appropriate subdirectory and save it. Commit this -->
<!-- change with an appropriate commit message. (At this stage, you can look at your "History" in the git -->
<!-- commit window to see the changes you've made so far.) -->
<!-- 5. Login to your GitHub account and create a new, blank repo with the same name as your R project. -->
<!-- (If you have not already set up an SSH key to make it easier to push back and forth between your computer -->
<!-- and GitHub, do so now.) -->
<!-- 6. On your computer, specify your GitHub repository as the remote for your project and do an initial -->
<!-- push to send all the files to the GitHub repo. Look at your repository on GitHub to make sure it worked. -->
<!-- 7. Add a README.md file to the top level of your R project (on your local computer) using Markdown syntax -->
<!-- and then commit the change locally and push to your GitHub repository. -->

<!-- For the README.md file, you can create a text file in RStudio and save it as "README.md". You can then write -->
<!-- this file using Markdown syntax (like RMarkdown, but without any code chunks). -->

<!-- As you work on your homework, make sure you commit regularly (with helpful commit messages). You should -->
<!-- have **at least 15 commit messages** in your history for the repo by the time you turn in the homework. -->

1. Read in the data as an R object named `homicides`and create a new column called `city_name` that combines the city and state like this "Baltimore, MD". (15 points)
3. Create a dataframe called `unsolved` with one row per city that gives the total number of homicides
for the city and the number of unsolved homicides (those for which the disposition is
"Closed without arrest" or "Open/No arrest"). (15 points)
4. For the city of Baltimore, MD, use the `prop.test` function to estimate the proportion of
homicides that are unsolved, as well as the 95% confidence interval for this proportion (we're assuming that the data from the years covered by this dataset is a representative sample of Baltimore in a larger set of years). Print the
output of the `prop.test` directly in your RMarkdown, and then save the output of `prop.test` as an
R object and apply the `tidy` function from the `broom` package to this object and pull the estimated
proportion and confidence intervals from the resulting tidy dataframe. (20 points)
5. Now use what you learned from running `prop.test` for one city to run `prop.test` for all the cities.
Your goal is to create the dataframe you need to create the figure shown below, where the points show the estimated proportions of
unsolved homicides in each city and the horizontal lines show the estimated 95% confidence intervals.
Do this all within a "tidy" pipeline, starting from the `unsolved` dataframe that you created for step 3.
Use `map2` from `purrr` to apply `prop.test` within each city and then `map` from `purrr` to apply
`tidy` to this output. Use the `unnest` function from the `tidyr` package on the resulting list-column
(from mapping `tidy` to the `prop.test` output list-column), with the option `.drop = TRUE`, to get your
estimates back into a regular tidy data frame before plotting. (25 points)
6. Create the plot shown below. Hint: Check out the `geom_errorbarh` geom with the `height = 0` option
to get the horizontal lines for the confidence intervals. To get full points, be very careful to make sure that all labeling, color choices, figure dimensions, etc., in your figure match the figure shown here. (25 points)
<!-- 7. All of the code for this should be in an RMarkdown document. Render this to a pdf and then push to your -->
<!-- GitHub repository. Go on GitHub and make sure that everything made in online. -->

```{r echo = FALSE, out.width = "500pt", fig.align = "center"}
knitr::include_graphics("figures/hw_4_plot.png")
```


## Homework #5

**Due date: Nov. 22**

For this homework, you will continue using the dataset used for Homework #4. You
will submit this project by sending me the link to a GitHub repository. For the
write-up of this homework, you'll need to write an RMarkdown document and render
it to a pdf file within the R Project for the GitHub repository.

### Setting up a GitHub repository for this project

Take the following steps to set up your GitHub repository for the homework (we will work on this as an
in-course exercise):

1. Set up a local (i.e., on your computer) R project with subdirectories for the data, writing (this
is where you'll put your Rmarkdown file and the output file), and figures.
2. Download the Washington Post data and save it in an appropriate place in this R project directory.
3. Initialize git for the R project and make your initial commit.
4. Create an RMarkdown file for your homework answers in the appropriate subdirectory and save it. Commit this
change with an appropriate commit message. (At this stage, you can look at your "History" in the git
commit window to see the changes you've made so far.)
5. Login to your GitHub account and create a new, blank repo with the same name as your R project.
(If you have not already set up an SSH key to make it easier to push back and forth between your computer
and GitHub, do so now.)
6. On your computer, specify your GitHub repository as the remote for your project and do an initial
push to send all the files to the GitHub repo. Look at your repository on GitHub to make sure it worked.
7. Add a README.md file to the top level of your R project (on your local computer) using Markdown syntax
and then commit the change locally and push to your GitHub repository.

For the README.md file, you can create a text file in RStudio and save it as "README.md". You can then write
this file using Markdown syntax (like RMarkdown, but without any code chunks).

As you work on your homework, make sure you commit regularly (with helpful commit messages). You should
have **at least 15 commit messages** in your history for the repo by the time you turn in the homework.

Select **one** of the following two figures to create for the homework:

1. **Choice 1:** Pick one city in the data. Create a map showing the locations of the homicides in that city, using
the `sf` framework discussed in class. Use `tigris` to download boundaries for some sub-city geography
(e.g., tracts, block groups, county subdivisions) to show as a layer underneath the points showing
homicides. Use different facets for solved versus unsolved homicides and different colors to show
the three race groups with the highest number of homicides for that city (you may find the `fct_lump`
function from `forcats` useful for this).
2. **Choice 2:** Recreate the graph shown below. It shows monthly homicides in Baltimore, with a reference added for the date
of the arrest of Freddie Gray and color used to show colder months (November through April) versus
warmer months (May through October). There is a smooth line added to help show seasonal and long-term
trends in this data.

```{r echo = FALSE, out.width = "800pt"}
knitr::include_graphics("figures/hw_5_plot.png")
```

<!-- All instructions for this homework can be downloaded [here](https://github.com/geanders/RProgrammingForResearch/raw/master/Homework/Homework4and5.pdf). The example "fars_analysis.pdf" document you will try to recreate is [here](https://github.com/geanders/RProgrammingForResearch/raw/master/Homework/fars_analysis.pdf).
<!-- You have the option to turn in parts of this homework (up through creating a clean dataset) by Oct. 28. If you do so, I will email you the code I used to clean the data, so you can check your own code and be sure you have a reasonable version of the clean data as you do the final parts of the assignment.
<!-- <!-- ## Homework #5 -->

<!-- <!-- **Due date: Nov. 8** -->

<!-- <!-- All instructions for this homework can be downloaded [here](https://github.com/geanders/RProgrammingForResearch/raw/master/Homework/Homework4and5.pdf). The example "fars_analysis.pdf" document you will try to recreate is [here](https://github.com/geanders/RProgrammingForResearch/raw/master/Homework/fars_analysis.pdf). -->

<!-- <!-- You will submit this homework by posting a repo with your project directory on GitHub. We will work on setting that up during an in-course exercise.  -->

## Homework #6 (Updated for 2018)

**Due date: Wednesday, Dec. 11**

1. Read the article *Good Enough Practices in Scientific Computing* by Wilson et al. (available [here](https://arxiv.org/abs/1609.00037)). In a half page, describe which of these "pretty good practices" you are incorporating for your group project. Also list one or two practices that you are not following but that would have made sense and how you could change your practices to follow them.

<!-- 2. Read the article [*Science Isn't Broken*](http://fivethirtyeight.com/features/science-isnt-broken/) on FiveThirtyEight. This article includes an interactive graphic. In a half page, give your opinion on whether this interactive graphic helps convey the main message of the article. Also, describe in general details how you might be able to create a graphic like this in R. -->

2. Find an article in *The R Journal* that describes an R package that you could use in your own research or otherwise find interesting. Describe why the package was created and what you think it's most interesting features are. In an R Markdown document, run one or two of the R examples included in the article.