-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDescriptive Statistics in R.Rmd
192 lines (142 loc) · 4.15 KB
/
Descriptive Statistics in R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: "Descriptive Statistics in R"
date: "30/09/2021"
output: html_document
---
#### Name : Sara Kulkarni
#### Reg. no : 19BCE1567
Create a data frame ‘newsurvey’ that contains the survey data in MASS package after removing the NA values. Use it for answering following queries:
```{r}
rm(list=ls())
#MASS Library
library(MASS)
#Loading the dplyr package
library(dplyr)
```
```{r}
#cleaning data
data(survey)
newsurvey <- na.omit(survey)
View(newsurvey)
str(newsurvey)
View(sur)
```
1. How many left and right handers are there?
```{r}
newsurvey %>% group_by(W.Hnd)%>%
summarize(count=n())
```
2. Find the relative frequency distribution of left and right handers and display them with the precision of two decimal places.
```{r}
options(digits=2)
whand_frequency=table(survey$W.Hnd)
size=nrow(survey)
whand_rel_frequency=whand_frequency/size
whand_rel_frequency
```
3. Display the male left hander and female left hander in the column format.
```{r}
newsurvey%>%
filter(W.Hnd =='Left')%>%
group_by(Sex)%>%
summarize(count=n())
```
4. What percentage of male right handers never smokes?
```{r}
male_left_handers <- newsurvey %>%
filter(W.Hnd =='Right' & Sex == 'Male',Smoke =='Never')
sizeA = nrow(male_left_handers)
sizeB =nrow(newsurvey)
(sizeA/sizeB)*100
```
5. Find the range of students’ height participated in the survey.
```{r}
range(newsurvey$Height)
```
```{r}
max <- max(newsurvey$Height)
min <- min(newsurvey$Height)
Height_range <- max - min
Height_range
```
6. Break the height range into non-overlapping sub-intervals by defining a sequence of equal distance break points of 10 by rounding the range to nearest integer.
```{r}
break_Height = seq(150, 210, by=10)
break_Height
```
7. Find the distribution of the height range according to the sub-intervals with cut with its right boundary opened. Display it in column form.
```{r}
Height_range = cut(newsurvey$Height, break_Height, right=FALSE)
Height_range_frequency = table(Height_range)
cbind(Height_range_frequency)
```
8. Which height range of students has mostly participated in the survey?
```{r}
max(Height_range_frequency)
which.max(Height_range_frequency)
```
9. Compute the mean, variance and standard deviation of the height of the students participated in the survey.
```{r}
mean <- mean(newsurvey$Height)
mean
variance <- var(newsurvey$Height)
variance
sd <- sd(newsurvey$Height)
sd
```
10. Which category of clap students has the maximum writing hand span?
```{r}
sum_span <- newsurvey %>%
group_by(Clap)%>%
summarise(sum_writing_span=sum(Wr.Hnd))%>%
arrange(desc(sum_writing_span))
sum_span
```
```{r}
head(sum_span, 1)['Clap']
```
11. Compute the covariance and correlation between height and writing span.
```{r}
cov(newsurvey$Height, newsurvey$Wr.Hnd)
```
```{r}
cor(newsurvey$Height, newsurvey$Wr.Hnd)
```
12. Display the 30%, 60% and 80% percentile of the height data.
```{r}
percentile_height=quantile(newsurvey$Height, c(.30,.60,.80))
percentile_height
```
Frame any three questions on descriptive statistics to analyse the categorical & quantitative variables present in the data of your choice.
1. Find the range of age, break the range into sub intervals by defining a sequence of equal distance break points of 10, display the distribution of age according to the sub-intervals.
```{r}
str(newsurvey)
age=newsurvey$Age
range(age)
```
```{r}
break2=seq(15,75,by=10)
break2
```
```{r}
age_cut=cut(age,break2)
table(age_cut)
```
2. Find the gender category which has maximum participants who never smoke.
```{r}
newsurvey%>%
group_by(Sex)%>%
filter(Smoke=="Never")%>%
summarize(count=n())%>%
arrange(desc(count))%>%
head(1)
```
3. Find the percentage of participants who exercise frequently and never smoke.
```{r}
perc <- newsurvey%>%
filter(Exer=="Freq",Smoke=="Never")
nrow(perc)/nrow(newsurvey)*100
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.