Skip to content

Commit 2786e69

Browse files
committed
source commit: 1f8258d
0 parents  commit 2786e69

23 files changed

+3035
-0
lines changed

CODE_OF_CONDUCT.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: "Contributor Code of Conduct"
3+
---
4+
5+
As contributors and maintainers of this project,
6+
we pledge to follow the [The Carpentries Code of Conduct][coc].
7+
8+
Instances of abusive, harassing, or otherwise unacceptable behavior
9+
may be reported by following our [reporting guidelines][coc-reporting].
10+
11+
12+
[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
13+
[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

LICENSE.md

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: "Licenses"
3+
---
4+
5+
## Instructional Material
6+
7+
All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry)
8+
instructional material is made available under the [Creative Commons
9+
Attribution license][cc-by-human]. The following is a human-readable summary of
10+
(and not a substitute for) the [full legal text of the CC BY 4.0
11+
license][cc-by-legal].
12+
13+
You are free:
14+
15+
- to **Share**---copy and redistribute the material in any medium or format
16+
- to **Adapt**---remix, transform, and build upon the material
17+
18+
for any purpose, even commercially.
19+
20+
The licensor cannot revoke these freedoms as long as you follow the license
21+
terms.
22+
23+
Under the following terms:
24+
25+
- **Attribution**---You must give appropriate credit (mentioning that your work
26+
is derived from work that is Copyright (c) The Carpentries and, where
27+
practical, linking to <https://carpentries.org/>), provide a [link to the
28+
license][cc-by-human], and indicate if changes were made. You may do so in
29+
any reasonable manner, but not in any way that suggests the licensor endorses
30+
you or your use.
31+
32+
- **No additional restrictions**---You may not apply legal terms or
33+
technological measures that legally restrict others from doing anything the
34+
license permits. With the understanding that:
35+
36+
Notices:
37+
38+
* You do not have to comply with the license for elements of the material in
39+
the public domain or where your use is permitted by an applicable exception
40+
or limitation.
41+
* No warranties are given. The license may not give you all of the permissions
42+
necessary for your intended use. For example, other rights such as publicity,
43+
privacy, or moral rights may limit how you use the material.
44+
45+
## Software
46+
47+
Except where otherwise noted, the example programs and other software provided
48+
by The Carpentries are made available under the [OSI][osi]-approved [MIT
49+
license][mit-license].
50+
51+
Permission is hereby granted, free of charge, to any person obtaining a copy of
52+
this software and associated documentation files (the "Software"), to deal in
53+
the Software without restriction, including without limitation the rights to
54+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
55+
of the Software, and to permit persons to whom the Software is furnished to do
56+
so, subject to the following conditions:
57+
58+
The above copyright notice and this permission notice shall be included in all
59+
copies or substantial portions of the Software.
60+
61+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
62+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
63+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
64+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
65+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
66+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
67+
SOFTWARE.
68+
69+
## Trademark
70+
71+
"The Carpentries", "Software Carpentry", "Data Carpentry", and "Library
72+
Carpentry" and their respective logos are registered trademarks of [Community
73+
Initiatives][ci].
74+
75+
[cc-by-human]: https://creativecommons.org/licenses/by/4.0/
76+
[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
77+
[mit-license]: https://opensource.org/licenses/mit-license.html
78+
[ci]: https://communityin.org/
79+
[osi]: https://opensource.org

basic-targets.md

+299
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
---
2+
title: 'First targets Workflow'
3+
teaching: 10
4+
exercises: 2
5+
---
6+
7+
:::::::::::::::::::::::::::::::::::::: questions
8+
9+
- What are best practices for organizing analyses?
10+
- What is a `_targets.R` file for?
11+
- What is the content of the `_targets.R` file?
12+
- How do you run a workflow?
13+
14+
::::::::::::::::::::::::::::::::::::::::::::::::
15+
16+
::::::::::::::::::::::::::::::::::::: objectives
17+
18+
- Create a project in RStudio
19+
- Explain the purpose of the `_targets.R` file
20+
- Write a basic `_targets.R` file
21+
- Use a `_targets.R` file to run a workflow
22+
23+
::::::::::::::::::::::::::::::::::::::::::::::::
24+
25+
::::::::::::::::::::::::::::::::::::: {.instructor}
26+
27+
Episode summary: First chance to get hands dirty by writing a very simple workflow
28+
29+
:::::::::::::::::::::::::::::::::::::
30+
31+
32+
33+
## Create a project
34+
35+
### About projects
36+
37+
`targets` uses the "project" concept for organizing analyses: all of the files needed for a given project are put in a single folder, the project folder.
38+
The project folder has additional subfolders for organization, such as folders for data, code, and results.
39+
40+
By using projects, it makes it straightforward to re-orient yourself if you return to an analysis after time spent elsewhere.
41+
This wouldn't be a problem if we only ever work on one thing at a time until completion, but that is almost never the case.
42+
It is hard to remember what you were doing when you come back to a project after working on something else (a phenomenon called "context switching").
43+
By using a standardized organization system, you will reduce confusion and lost time... in other words, you are increasing reproducibility!
44+
45+
This workshop will use RStudio, since it also works well with the project organization concept.
46+
47+
### Create a project in RStudio
48+
49+
Let's start a new project using RStudio.
50+
51+
Click "File", then select "New Project".
52+
53+
This will open the New Project Wizard, a set of menus to help you set up the project.
54+
55+
![The New Project Wizard](fig/basic-rstudio-wizard.png){alt="Screenshot of RStudio New Project Wizard menu"}
56+
57+
In the Wizard, click the first option, "New Directory", since we are making a brand-new project from scratch.
58+
Click "New Project" in the next menu.
59+
In "Directory name", enter a name that helps you remember the purpose of the project, such as "targets-demo" (follow best practices for naming files and folders).
60+
Under "Create project as a subdirectory of...", click the "Browse" button to select a directory to put the project.
61+
We recommend putting it on your Desktop so you can easily find it.
62+
63+
You can leave "Create a git repository" and "Use renv with this project" unchecked, but these are both excellent tools to improve reproducibility, and you should consider learning them and using them in the future, if you don't already.
64+
They can be enabled at any later time, so you don't need to worry about trying to use them immediately.
65+
66+
Once you work through these steps, your RStudio session should look like this:
67+
68+
![Your newly created project](fig/basic-rstudio-project.png){alt="Screenshot of RStudio with a newly created project called 'targets-demo' open containing a single file, 'targets-demo.Rproj'"}
69+
70+
Our project now contains a single file, created by RStudio: `targets-demo.Rproj`. You should not edit this file by hand. Its purpose is to tell RStudio that this is a project folder and to store some RStudio settings (if you use version-control software, it is OK to commit this file). Also, you can open the project by double clicking on the `.Rproj` file in your file explorer (try it by quitting RStudio then navigating in your file browser to your Desktop, opening the "targets-demo" folder, and double clicking `targets-demo.Rproj`).
71+
72+
OK, now that our project is set up, we are ready to start using `targets`!
73+
74+
## Create a `_targets.R` file
75+
76+
Every `targets` project must include a special file, called `_targets.R` in the main project folder (the "project root").
77+
The `_targets.R` file includes the specification of the workflow: directions for R to run your analysis, kind of like a recipe.
78+
By using the `_targets.R` file, you won't have to remember to run specific scripts in a certain order.
79+
Instead, R will do it for you (more reproducibility points)!
80+
81+
### Anatomy of a `_targets.R` file
82+
83+
We will now start to write a `_targets.R` file. Fortunately, `targets` comes with a function to help us do this.
84+
85+
In the R console, first load the `targets` package with `library(targets)`, then run the command `tar_script()`.
86+
87+
88+
```r
89+
library(targets)
90+
tar_script()
91+
```
92+
93+
Nothing will happen in the console, but in the file viewer, you should see a new file, `_targets.R` appear. Open it using the File menu or by clicking on it.
94+
95+
We can see this default `_targets.R` file includes three main parts:
96+
97+
- Loading packages with `library()`
98+
- Defining a custom function with `function()`
99+
- Defining a list with `list()`.
100+
101+
The last part, the list, is the most important part of the `_targets.R` file.
102+
It defines the steps in the workflow.
103+
The `_targets.R` file must always end with this list.
104+
105+
Furthermore, each item in the list is a call of the `tar_target()` function.
106+
The first argument of `tar_target()` is name of the target to build, and the second argument is the command used to build it.
107+
Note that the name of the target is **unquoted**, that is, it is written without any surrounding quotation marks.
108+
109+
## Set up `_targets.R` file to run example analysis
110+
111+
### Background: non-`targets` version
112+
113+
We will use this template to start building our analysis of bill shape in penguins.
114+
First though, to get familiar with the functions and packages we'll use, let's run the code like you would in a "normal" R script without using `targets`.
115+
116+
Recall that we are using the `palmerpenguins` R package to obtain the data.
117+
This package actually includes two variations of the dataset: one is an external CSV file with the raw data, and another is the cleaned data loaded into R.
118+
In real life you are probably have externally stored raw data, so **let's use the raw penguin data** as the starting point for our analysis too.
119+
120+
The `path_to_file()` function in `palmerpenguins` provides the path to the raw data CSV file (it is inside the `palmerpenguins` R package source code that you downloaded to your computer when you installed the package).
121+
122+
123+
```r
124+
library(palmerpenguins)
125+
126+
# Get path to CSV file
127+
penguins_csv_file <- path_to_file("penguins_raw.csv")
128+
129+
penguins_csv_file
130+
```
131+
132+
```{.output}
133+
[1] "/home/runner/.local/share/renv/cache/v5/R-4.3/x86_64-pc-linux-gnu/palmerpenguins/0.1.1/6c6861efbc13c1d543749e9c7be4a592/palmerpenguins/extdata/penguins_raw.csv"
134+
```
135+
136+
We will use the `tidyverse` set of packages for loading and manipulating the data. We don't have time to cover all the details about using `tidyverse` now, but if you want to learn more about it, please see the ["Manipulating, analyzing and exporting data with tidyverse" lesson](https://datacarpentry.org/R-ecology-lesson/03-dplyr.html).
137+
138+
Let's load the data with `read_csv()`.
139+
140+
141+
```r
142+
library(tidyverse)
143+
144+
# Read CSV file into R
145+
penguins_data_raw <- read_csv(penguins_csv_file)
146+
147+
penguins_data_raw
148+
```
149+
150+
151+
```{.output}
152+
Rows: 344 Columns: 17
153+
── Column specification ────────────────────────────────────────────────────────
154+
Delimiter: ","
155+
chr (9): studyName, Species, Region, Island, Stage, Individual ID, Clutch C...
156+
dbl (7): Sample Number, Culmen Length (mm), Culmen Depth (mm), Flipper Leng...
157+
date (1): Date Egg
158+
159+
ℹ Use `spec()` to retrieve the full column specification for this data.
160+
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
161+
```
162+
163+
```{.output}
164+
# A tibble: 344 × 17
165+
studyName `Sample Number` Species Region Island Stage `Individual ID`
166+
<chr> <dbl> <chr> <chr> <chr> <chr> <chr>
167+
1 PAL0708 1 Adelie Penguin… Anvers Torge… Adul… N1A1
168+
2 PAL0708 2 Adelie Penguin… Anvers Torge… Adul… N1A2
169+
3 PAL0708 3 Adelie Penguin… Anvers Torge… Adul… N2A1
170+
4 PAL0708 4 Adelie Penguin… Anvers Torge… Adul… N2A2
171+
5 PAL0708 5 Adelie Penguin… Anvers Torge… Adul… N3A1
172+
6 PAL0708 6 Adelie Penguin… Anvers Torge… Adul… N3A2
173+
7 PAL0708 7 Adelie Penguin… Anvers Torge… Adul… N4A1
174+
8 PAL0708 8 Adelie Penguin… Anvers Torge… Adul… N4A2
175+
9 PAL0708 9 Adelie Penguin… Anvers Torge… Adul… N5A1
176+
10 PAL0708 10 Adelie Penguin… Anvers Torge… Adul… N5A2
177+
# ℹ 334 more rows
178+
# ℹ 10 more variables: `Clutch Completion` <chr>, `Date Egg` <date>,
179+
# `Culmen Length (mm)` <dbl>, `Culmen Depth (mm)` <dbl>,
180+
# `Flipper Length (mm)` <dbl>, `Body Mass (g)` <dbl>, Sex <chr>,
181+
# `Delta 15 N (o/oo)` <dbl>, `Delta 13 C (o/oo)` <dbl>, Comments <chr>
182+
```
183+
184+
We see the raw data has some awkward column names with spaces (these are hard to type out and can easily lead to mistakes in the code), and far more columns than we need.
185+
For the purposes of this analysis, we only need species name, bill length, and bill depth.
186+
In the raw data, the rather technical term "culmen" is used to refer to the bill.
187+
188+
![Illustration of bill (culmen) length and depth. Artwork by @allison_horst.](https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png)
189+
190+
Let's clean up the data to make it easier to use for downstream analyses.
191+
We will also remove any rows with missing data, because this could cause errors for some functions later.
192+
193+
194+
```r
195+
# Clean up raw data
196+
penguins_data <- penguins_data_raw |>
197+
# Rename columns for easier typing and
198+
# subset to only the columns needed for analysis
199+
select(
200+
species = Species,
201+
bill_length_mm = `Culmen Length (mm)`,
202+
bill_depth_mm = `Culmen Depth (mm)`
203+
) |>
204+
# Delete rows with missing data
205+
remove_missing(na.rm = TRUE)
206+
207+
penguins_data
208+
```
209+
210+
```{.output}
211+
# A tibble: 342 × 3
212+
species bill_length_mm bill_depth_mm
213+
<chr> <dbl> <dbl>
214+
1 Adelie Penguin (Pygoscelis adeliae) 39.1 18.7
215+
2 Adelie Penguin (Pygoscelis adeliae) 39.5 17.4
216+
3 Adelie Penguin (Pygoscelis adeliae) 40.3 18
217+
4 Adelie Penguin (Pygoscelis adeliae) 36.7 19.3
218+
5 Adelie Penguin (Pygoscelis adeliae) 39.3 20.6
219+
6 Adelie Penguin (Pygoscelis adeliae) 38.9 17.8
220+
7 Adelie Penguin (Pygoscelis adeliae) 39.2 19.6
221+
8 Adelie Penguin (Pygoscelis adeliae) 34.1 18.1
222+
9 Adelie Penguin (Pygoscelis adeliae) 42 20.2
223+
10 Adelie Penguin (Pygoscelis adeliae) 37.8 17.1
224+
# ℹ 332 more rows
225+
```
226+
227+
That's better!
228+
229+
### `targets` version
230+
231+
What does this look like using `targets`?
232+
233+
The biggest difference is that we need to **put each step of the workflow into the list at the end**.
234+
235+
We also define a custom function for the data cleaning step.
236+
That is because the list of targets at the end **should look like a high-level summary of your analysis**.
237+
You want to avoid lengthy chunks of code when defining the targets; instead, put that code in the custom functions.
238+
The other steps (setting the file path and loading the data) are each just one function call so there's not much point in putting those into their own custom functions.
239+
240+
Finally, each step in the workflow is defined with the `tar_target()` function.
241+
242+
243+
```r
244+
library(targets)
245+
library(palmerpenguins)
246+
library(tidyverse)
247+
248+
clean_penguin_data <- function(penguins_data_raw) {
249+
penguins_data_raw |>
250+
select(
251+
species = Species,
252+
bill_length_mm = `Culmen Length (mm)`,
253+
bill_depth_mm = `Culmen Depth (mm)`
254+
) |>
255+
remove_missing(na.rm = TRUE)
256+
}
257+
258+
list(
259+
tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")),
260+
tar_target(penguins_data_raw, read_csv(
261+
penguins_csv_file, show_col_types = FALSE)),
262+
tar_target(penguins_data, clean_penguin_data(penguins_data_raw))
263+
)
264+
```
265+
266+
I have set `show_col_types = FALSE` in `read_csv()` because we know from the earlier code that the column types were set correctly by default (character for species and numeric for bill length and depth), so we don't need to see the warning it would otherwise issue.
267+
268+
## Run the workflow
269+
270+
Now that we have a workflow, we can run it with the `tar_make()` function.
271+
Try running it, and you should see something like this:
272+
273+
274+
```r
275+
tar_make()
276+
```
277+
278+
279+
```{.output}
280+
▶ start target penguins_csv_file
281+
● built target penguins_csv_file [0.002 seconds]
282+
▶ start target penguins_data_raw
283+
● built target penguins_data_raw [0.148 seconds]
284+
▶ start target penguins_data
285+
● built target penguins_data [0.011 seconds]
286+
▶ end pipeline [0.244 seconds]
287+
```
288+
289+
Congratulations, you've run your first workflow with `targets`!
290+
291+
::::::::::::::::::::::::::::::::::::: keypoints
292+
293+
- Projects help keep our analyses organized so we can easily re-run them later
294+
- Use the RStudio Project Wizard to create projects
295+
- The `_targets.R` file is a special file that must be included in all `targets` projects, and defines the worklow
296+
- Use `tar_script()` to create a default `_targets.R` file
297+
- Use `tar_make()` to run the workflow
298+
299+
::::::::::::::::::::::::::::::::::::::::::::::::

0 commit comments

Comments
 (0)