You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto-generated via `{sandpaper}`
Source : cd7dbe7
Branch : main
Author : Naupaka Zimmerman <[email protected]>
Time : 2025-01-07 18:05:01 +0000
Message : Merge pull request #908 from caseyyoungflesh/04-fix_feline-data_v2.csv
Add data in place of `feline-data_v2.csv`, closes#717
Copy file name to clipboardExpand all lines: 04-data-structures-part1.md
+38-52Lines changed: 38 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -237,30 +237,38 @@ No matter how
237
237
complicated our analyses become, all data in R is interpreted as one of these
238
238
basic data types. This strictness has some really important consequences.
239
239
240
-
A user has added details of another cat. This information is in the file
241
-
`data/feline-data_v2.csv`.
240
+
A user has provided details of another cat. We can add an additional row to our cats table using `rbind()`.
242
241
243
242
244
243
```r
245
-
file.show("data/feline-data_v2.csv")
244
+
additional_cat<-data.frame(coat="tabby", weight="2.3 or 2.4", likes_catnip=1)
245
+
additional_cat
246
246
```
247
247
248
+
```output
249
+
coat weight likes_catnip
250
+
1 tabby 2.3 or 2.4 1
251
+
```
248
252
249
253
```r
250
-
coat,weight,likes_catnip
251
-
calico,2.1,1
252
-
black,5.0,0
253
-
tabby,3.2,1
254
-
tabby,2.3or2.4,1
254
+
cats2<- rbind(cats, additional_cat)
255
+
cats2
256
+
```
257
+
258
+
```output
259
+
coat weight likes_catnip
260
+
1 calico 2.1 1
261
+
2 black 5 0
262
+
3 tabby 3.2 1
263
+
4 tabby 2.3 or 2.4 1
255
264
```
256
265
257
-
Load the new cats data like before, and check what type of data we find in the
258
-
`weight` column:
266
+
Let's check what type of data we find in the
267
+
`weight` column of our new `cats2` object:
259
268
260
269
261
270
```r
262
-
cats<- read.csv(file="data/feline-data_v2.csv")
263
-
typeof(cats$weight)
271
+
typeof(cats2$weight)
264
272
```
265
273
266
274
```output
@@ -272,18 +280,18 @@ we did on them before, we run into trouble:
272
280
273
281
274
282
```r
275
-
cats$weight+2
283
+
cats2$weight+2
276
284
```
277
285
278
286
```error
279
-
Error in cats$weight + 2: non-numeric argument to binary operator
287
+
Error in cats2$weight + 2: non-numeric argument to binary operator
280
288
```
281
289
282
290
What happened?
283
-
The `cats` data we are working with is something called a *data frame*. Data frames
291
+
The `cats`(and `cats2`) data we are working with is something called a *data frame*. Data frames
284
292
are one of the most common and versatile types of *data structures* we will work with in R.
285
293
A given column in a data frame cannot be composed of different data types.
286
-
In this case, R does not read everything in the data frame column `weight` as a *double*, therefore the entire
294
+
In this case, R cannot store everything in the data frame column `weight` as a *double* anymore once we add the row for the additional cat (because its weight is `2.3 or 2.4`), therefore the entire
287
295
column data type changes to something that is suitable for everything in the column.
288
296
289
297
When R reads a csv file, it reads it in as a *data frame*. Thus, when we loaded the `cats`
@@ -292,42 +300,22 @@ is written by the `str()` function:
292
300
293
301
294
302
```r
295
-
str(cats)
303
+
str(cats2)
296
304
```
297
305
298
306
```output
299
307
'data.frame': 4 obs. of 3 variables:
300
308
$ coat : chr "calico" "black" "tabby" "tabby"
301
309
$ weight : chr "2.1" "5" "3.2" "2.3 or 2.4"
302
-
$ likes_string: int 1 0 1 1
310
+
$ likes_catnip: num 1 0 1 1
303
311
```
304
312
305
313
*Data frames* are composed of rows and columns, where each column has the
306
314
same number of rows. Different columns in a data frame can be made up of different
307
315
data types (this is what makes them so versatile), but everything in a given
308
316
column needs to be the same type (e.g., vector, factor, or list).
309
317
310
-
Let's explore more about different data structures and how they behave.
311
-
For now, let's remove that extra line from our cats data and reload it,
312
-
while we investigate this behavior further:
313
-
314
-
feline-data.csv:
315
-
316
-
```
317
-
coat,weight,likes_catnip
318
-
calico,2.1,1
319
-
black,5.0,0
320
-
tabby,3.2,1
321
-
```
322
-
323
-
And back in RStudio:
324
-
325
-
326
-
```r
327
-
cats<- read.csv(file="data/feline-data.csv")
328
-
```
329
-
330
-
318
+
Let's explore more about different data structures and how they behave. For now, we will focus on our original data frame `cats` (and we can forget about `cats2` for the rest of this episode).
331
319
332
320
### Vectors and Type Coercion
333
321
@@ -555,8 +543,7 @@ Create a new script in RStudio and copy and paste the following code. Then
555
543
move on to the tasks below, which help you to fill in the gaps (\_\_\_\_\_\_).
556
544
557
545
```
558
-
# Read data
559
-
cats <- read.csv("data/feline-data_v2.csv")
546
+
Using the object `cats2`:
560
547
561
548
# 1. Print the data
562
549
_____
@@ -568,15 +555,15 @@ _____(cats)
568
555
# The correct data type is: ____________.
569
556
570
557
# 4. Correct the 4th weight data point with the mean of the two given values
571
-
cats$weight[4] <- 2.35
558
+
cats2$weight[4] <- 2.35
572
559
# print the data again to see the effect
573
560
cats
574
561
575
562
# 5. Convert the weight to the right data type
576
-
cats$weight <- ______________(cats$weight)
563
+
cats2$weight <- ______________(cats2$weight)
577
564
578
565
# Calculate the mean to test yourself
579
-
mean(cats$weight)
566
+
mean(cats2$weight)
580
567
581
568
# If you see the correct mean value (and not NA), you did the exercise
582
569
# correctly!
@@ -586,7 +573,7 @@ mean(cats$weight)
586
573
587
574
#### 1\. Print the data
588
575
589
-
Execute the first statement (`read.csv(...)`). Then print the data to the
576
+
Print the data to the
590
577
console
591
578
592
579
::::::::::::::: solution
@@ -601,8 +588,8 @@ Show the content of any variable by typing its name.
601
588
Two correct solutions:
602
589
603
590
```
604
-
cats
605
-
print(cats)
591
+
cats2
592
+
print(cats2)
606
593
```
607
594
608
595
:::::::::::::::::::::::::
@@ -611,7 +598,7 @@ print(cats)
611
598
612
599
The data type of your data is as important as the data itself. Use a
613
600
function we saw earlier to print out the data types of all columns of the
614
-
`cats` table.
601
+
`cats2``data.frame`.
615
602
616
603
::::::::::::::: solution
617
604
@@ -628,15 +615,14 @@ here.
628
615
> ### Solution to Challenge 1.2
629
616
>
630
617
> ```
631
-
> str(cats)
618
+
> str(cats2)
632
619
> ```
633
620
634
621
#### 3\. Which data type do we need?
635
622
636
623
The shown data type is not the right one for this data (weight of
637
624
a cat). Which data type do we need?
638
625
639
-
- Why did the `read.csv()` function not choose the correct data type?
640
626
- Fill in the gap in the comment with the correct data type for cat weight!
641
627
642
628
::::::::::::::: solution
@@ -715,8 +701,8 @@ auto-complete function: Type "`as.`" and then press the TAB key.
715
701
> There are two functions that are synonymous for historic reasons:
0 commit comments