You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The expression `read.csv(...)` is a [function call](../learners/reference.md#function-call) that asks R to run the function `read.csv`.
89
89
90
90
`read.csv` has two [arguments](../learners/reference.md#argument): the name of the file we want to read, and whether the first line of the file contains names for the columns of data.
91
-
The filename needs to be a character string (or [string](../learners/reference.md#string)for short), so we put it in quotes.
91
+
The filename needs to be a character string (or [string](../learners/reference.md#string)for short), so we put it in quotes.
92
92
Assigning the second argument, `header`, to be `FALSE` indicates that the data file does not have column headers.
93
93
We'll talk more about the value `FALSE`, and its converse `TRUE`, in lesson 04.
94
94
In case of our `inflammation-01.csv` example, R auto-generates column names in the sequence `V1` (for "variable 1"), `V2`, and so on, until `V40`.
@@ -334,7 +334,7 @@ dim(dat)
334
334
335
335
This tells us that our data frame, `dat`, has `r nrow(dat)` rows and `r ncol(dat)` columns.
336
336
337
-
If we want to get a single value from the data frame, we can provide an [index](../learners/reference.md#index)in square brackets.
337
+
If we want to get a single value from the data frame, we can provide an [index](../learners/reference.md#index)in square brackets.
338
338
The first number specifies the row and the second the column:
339
339
340
340
```{r selecting data frame elements}
@@ -466,7 +466,7 @@ sd(dat[, 7])
466
466
467
467
## Forcing Conversion
468
468
469
-
Note that R may return an error when you attempt to perform similar calculations on subsetted *rows*of data frames.
469
+
Note that R may return an error when you attempt to perform similar calculations on subsetted *rows*of data frames.
470
470
This is because some functions in R automatically convert the object type to a numeric vector, while others do not (e.g. `max(dat[1, ])` works as expected, while `mean(dat[1, ])` returns `NA` and a warning).
471
471
You get the expected output by including an explicit call to `as.numeric()`, e.g. `mean(as.numeric(dat[1, ]))`.
472
472
By contrast, calculations on subsetted *columns* always work as expected, since columns of data frames are already defined as vectors.
Copy file name to clipboardExpand all lines: episodes/04-cond.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -280,7 +280,7 @@ When `use_boxplot` is set to `FALSE`, `plot_dist` will instead plot a histogram
280
280
As before, if the length of the vector is shorter than `threshold`, `plot_dist` will create a stripchart.
281
281
A histogram is made with the `hist` command in R.
282
282
283
-
```{r conditional-challenge-hist, fig.alt=c("A grey unlabeled boxplot chart showing the distrubution values between 2 and 9 with a mean at 6.", "A grey unlabeled histogram showing bimodal distribution between 2 and 9 with peaks at 2 and 6.", "A mostly blank strip chart showing five points at 3, 4, 6, 7, and 9"), echo=-1}
283
+
```{r conditional-challenge-hist, fig.alt=c("A grey unlabeled boxplot chart showing the distribution of values between 2 and 9 with a mean at 6.", "A grey unlabeled histogram showing bimodal distribution between 2 and 9 with peaks at 2 and 6.", "A mostly blank strip chart showing five points at 3, 4, 6, 7, and 9"), echo=-1}
Copy file name to clipboardExpand all lines: episodes/06-best-practices-R.Rmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ library(reshape)
52
52
library(vegan)
53
53
```
54
54
55
-
Another way you can be explicit about the requirements of your code and improve it's reproducibility is to limit the "hard-coding" of the input and output files for your script.
55
+
Another way you can be explicit about the requirements of your code and improve its reproducibility is to limit the "hard-coding" of the input and output files for your script.
56
56
If your code will read in data from a file, define a variable early in your code that stores the path to that file.
57
57
For example
58
58
@@ -111,7 +111,7 @@ It's easy to annotate and mark your code using `#` or `#-`to set off sections of
111
111
For example, it's often helpful when writing code to separate the function definitions.
112
112
If you create only one or a few custom functions in your script, put them toward the top of your code.
113
113
If you have written many functions, put them all in their own .
114
-
R file and then`source` those files. `source` will define all of these functions so that your code can make use of them as needed.
114
+
R file and then`source` those files. `source` will define all of these functions so that your code can make use of them as needed.
Copy file name to clipboardExpand all lines: episodes/11-supp-read-write-csv.Rmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -28,12 +28,12 @@ library(svglite)
28
28
29
29
The most common way that scientists store data is in Excel spreadsheets.
30
30
While there are R packages designed to access data from Excel spreadsheets (e.g., gdata, RODBC, XLConnect, xlsx, RExcel), users often find it easier to save their spreadsheets in [comma-separated values](reference.html#comma-separated-values-csv) files (CSV) and then use R's built in functionality to read and manipulate the data.
31
-
In this short lesson, we'll learn how to read data from a .csv and write to a new .csv, and explore the [arguments](../learners/reference.md#argument) that allow you read and write the data correctly for your needs.
31
+
In this short lesson, we'll learn how to read data from a .csv and write to a new .csv, and explore the [arguments](../learners/reference.md#argument) that allow you to read and write the data correctly for your needs.
32
32
33
33
### Read a .csv and Explore the Arguments
34
34
35
35
Let's start by opening a .csv file containing information on the speeds at which cars of different colors were clocked in 45 mph zones in the four-corners states (`car-speeds.csv`).
36
-
We will use the built in `read.csv(...)`[function call](../learners/reference.md#function-call), which reads the data in as a data frame, and assign the data frame to a variable (using `<-`) so that it is stored in R's memory.
36
+
We will use the built in `read.csv(...)`[function call](../learners/reference.md#function-call), which reads the data in as a data frame, and assigns the data frame to a variable (using `<-`) so that it is stored in R's memory.
37
37
Then we will explore some of the basic arguments that can be supplied to the function.
38
38
First, open the RStudio project containing the scripts and data you were working on in episode 'Analyzing Patient Data'.
Copy file name to clipboardExpand all lines: episodes/12-supp-factors.Rmd
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -33,8 +33,8 @@ Factors can be ordered or unordered and are an important class for statistical a
33
33
Factors are stored as integers, and have labels associated with these unique integers.
34
34
While factors look (and often behave) like character vectors, they are actually integers under the hood, and you need to be careful when treating them like strings.
35
35
36
-
Once created, factors can only contain a pre-defined set values, known as *levels*.
37
-
By default, R always sorts*levels*in alphabetical order.
36
+
Once created, factors can only contain a pre-defined set of values, known as *levels*.
37
+
By default, R always sorts*levels*in alphabetical order.
38
38
For instance, if you have a factor with 2 levels:
39
39
40
40
::::::::::::::::::::::::::::::::::::::::: callout
@@ -57,7 +57,7 @@ levels(sex)
57
57
nlevels(sex)
58
58
```
59
59
60
-
Sometimes, the order of the factors does not matter, other times you might want to specify the order because it is meaningful (e.g., "low", "medium", "high") or it is required by particular type of analysis.
60
+
Sometimes, the order of the factors does not matter, other times you might want to specify the order because it is meaningful (e.g., "low", "medium", "high") or it is required by a particular type of analysis.
61
61
Additionally, specifying the order of the levels allows us to compare levels:
Copy file name to clipboardExpand all lines: episodes/13-supp-data-structures.Rmd
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -363,7 +363,7 @@ mdat[2, 3]
363
363
364
364
In R lists act as containers.
365
365
Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types.
366
-
Lists are sometimes called generic vectors, because the elements of a list can by of any type of R object, even lists containing further lists.
366
+
Lists are sometimes called generic vectors, because the elements of a list can be of any type of R object, even lists containing further lists.
367
367
This property makes them fundamentally different from atomic vectors.
368
368
369
369
A list is a special type of vector.
@@ -461,18 +461,18 @@ If the elements of a list are named, they can be referenced by the `$` notation
461
461
A data frame is a very important data type in R.
462
462
It's pretty much the *de facto* data structure for most tabular data and what we use for statistics.
463
463
464
-
A data frame is a *special type of list* where every element of the list has same length (i.e. data frame is a "rectangular" list).
464
+
A data frame is a *special type of list* where every element of the list has the same length (i.e. data frame is a "rectangular" list).
465
465
466
466
Data frames can have additional attributes such as `rownames()`, which can be useful for annotating data, like `subject_id` or `sample_id`.
467
467
But most of the time they are not used.
468
468
469
469
Some additional information on data frames:
470
470
471
471
- Usually created by `read.csv()` and `read.table()`, i.e. when importing the data into R.
472
-
- Assuming all columns in a data frame are of same type, data frame can be converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will be enforced and the results may not always be what you expect.
472
+
- Assuming all columns in a data frame are of the same type, data frame can be converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will be enforced and the results may not always be what you expect.
473
473
- Can also create a new data frame with `data.frame()` function.
474
474
- Find the number of rows and columns with `nrow(dat)` and `ncol(dat)`, respectively.
475
-
-Rownames are often automatically generated and look like 1, 2, ..., n. Consistency in numbering of rownames may not be honored when rows are reshuffled or subset.
475
+
-Row names are often automatically generated and look like 1, 2, ..., n. Consistency in numbering of rownames may not be honored when rows are reshuffled or subset.
476
476
477
477
### Creating Data Frames by Hand
478
478
@@ -518,7 +518,7 @@ dat[["y"]]
518
518
dat$y
519
519
```
520
520
521
-
The following table summarizes the one-dimensional and two-dimensional data structures in R in relation to diversity of data types they can contain.
521
+
The following table summarizes the one-dimensional and two-dimensional data structures in R in relation to the diversity of data types they can contain.
522
522
523
523
| Dimensions | Homogenous | Heterogeneous |
524
524
| ---------- | ------------- | ------------- |
@@ -528,7 +528,7 @@ The following table summarizes the one-dimensional and two-dimensional data stru
528
528
::::::::::::::::::::::::::::::::::::::::: callout
529
529
530
530
Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain data frames or another type of objects).
531
-
Lists can also contain elements of any length, therefore list do not necessarily have to be "rectangular".
531
+
Lists can also contain elements of any length, therefore lists do not necessarily have to be "rectangular".
532
532
However in order for the list to qualify as a data frame, the length of each element has to be the same.
0 commit comments