forked from bvkrauth/is4e
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-Introduction.Rmd
307 lines (235 loc) · 13.7 KB
/
01-Introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
# (PART\*) Fundamentals {-}
# Introduction {#introduction}
This is an introductory course in statistics for economics. It is similar to
courses taught all over the world to first and second year university students
in business, economics, and other social sciences.
::: {.goals data-latex=""}
**Course goals**
By the end of this course, you will:
Develop computer skills:
1. Clean, analyze, and graph data in Excel.
2. Clean, analyze, and graph data in R.
3. Follow recommended practices for data management and reproducible analysis.
Become familiar with basic statistical concepts:
4. Calculate and interpret probabilities and expected values.
5. Explain the relationship between population and sample.
6. Describe the properties of a statistic including its
probability distribution, expected value, and variance.
7. Describe the properties of an estimator including its
bias and mean squared error.
8. State and apply the law of large numbers and central limit theorem.
Apply these skills in combination to analyze real-world economic data:
9. Construct and interpret common charts including histograms, scatter plots,
and time-series plots.
10. Construct and interpret frequency tables and cross-tabulations.
11. Construct and interpret common univariate and bivariate statistics,
including mean, variance, standard deviation, covariance and correlation.
12. Construct and interpret hypothesis tests and confidence intervals.
We will be switching back and forth between theory, data analysis and
applications. All three skill sets are valuable.
:::
Hopefully you are in this course because you are fascinated by statistics
and can't wait to learn more about it. But most of you are taking it because
it's a required course. So I'd like to motivate everyone to take this course
as an opportunity to learn some very useful skills.
## Course context {#course-goals-and-context}
The context of this course is a world that is awash in data:
- Retailers maintain databases of transactions.
- Manufacturers track product quality and costs.
- Marketers collect data on customers and potential customers.
- Governments record everyone's interactions with schools,
tax authorities, social welfare, health care and criminal justice.
- Employers maintain detailed personnel records.
These databases can be linked and analyzed in various ways, and many
of the world's most successful companies rely heavily on the innovative
gathering and usage of data:
- Google's core product (the search engine) is built on the
innovative analysis of massive amounts of data.
- Both Google and the major social media companies are based on providing
valuable "free" services in order to gather data on consumers that can then
be directly or indirectly sold to other businesses.
- Amazon and other retailers use what is called A/B testing to fine-tune product
descriptions and set prices so as to maximize profits.
Some of this data analysis is done by statisticians or computer scientists,
but much of it is done by economists. For example, Amazon is the
second-largest employer of PhD economists in the US
(after the Federal Reserve System). The average economist does not
have a computer scientist's expertise on computers, or a business person's
expertise on markets, but knows enough about both to be quite useful to
an organization that has data and wants money.
This course will not qualify you for those jobs - they typically require a
graduate degree - but it is a first step in that direction. So you should take
it as a lucrative investment opportunity.
::: {.economics data-latex=""}
**Be the Mona Lisa**
Students should always think about the future and remember that labour markets
are driven by supply and demand. In the labour market, your skills and effort
are the product, and you are the seller. Like all sellers, you want your
product to be expensive. This requires that you have skills that are both:
- Useful (high demand)
- Uncommon (low supply)
In other words, you need to be like the Mona Lisa. If your skills are useful but
common (like water), or rare but useless (like my one-of-a-kind drawing of the
Mona Lisa) your labour will sell at a low price.
| | High demand | Low demand |
|-------------|:------------------------------------------:|:---------------------------------------------:|
| Low supply | <img src="bin/MonaLisa.png" width="100" /> | <img src="bin/BadMonaLisa.png" width="100" /> |
| High supply | <img src="bin/Water.png" width="100" /> | <img src="bin/Garbage.png" width="200" /> |
The combined ability to analyze data in a sophisticated way, and to explain the
results in written or oral presentation, is an extremely useful and uncommon
skill. Most of you do not have the technical skills of your colleagues in
Computer Science, but if you can combine a reasonable level of computer skills
with writing, knowledge of the underlying statistical principles, and the
ability to recognize the economic context of a situation, you will do
quite well.
:::
## Expectations {#organization-and-expectations}
This textbook and course are constructed under the assumptions that:
1. You have taken introductory microeconomics and introductory macroeconomics.
- Ideas from those courses will be used in applications and examples.
2. You can do high-school level math including algebra and basic set theory and
have taken or are currently taking an introductory calculus course.
- You will not be asked to solve integrals.
- You may be asked to take a simple derivative or two.
- We will use concepts like functions, sequences and limits that are
emphasized in an introductory calculus course.
- The [Math Review](#math-review) appendix reviews the relevant concepts
and provides practice problems.
3. You have access to a desktop or laptop computer, internet access,
and basic computer skills:
- Open, close, save, rename, move, and copy files.
- Create and use folders (also called directories) and subfolders.
- Install and run applications (also called programs).
This is not a textbook on introductory economics, high school math, or computer
science. If you are a little behind in those skills, you can catch up but you
will need to be proactive in asking for help.
::: {.sfu data-latex=""}
**ECON 233**
ECON 233 is the first course in the two-course econometrics sequence that is
required for all economics majors. If you've never seen the word before,
"econometrics" just means statistics and data analysis for economics.
All economics majors have the option of taking ECON 233 or BUS 232, so you may
be wondering what the difference is. Either course is suitable preparation for
ECON 333, but there are some key differences:
- *Tools:* ECON 233 uses both Excel and R, while BUS 232 uses Excel.
- You are likely to use R in ECON 333 and other upper-division ECON courses,
so it is nice to get used to it now.
- *Applications:* ECON 233 emphasizes economics applications, while BUS 232
emphasizes business applications.
ECON 233 is part of the Social Data and Analytics (SDA) minor; if you are an
economics student and are interested in that minor you are recommended to take
ECON 233.
:::
If you continue in an economics major, you will take at least one additional
course in econometrics. In those courses, you will learn more advanced
techniques such as linear regression, you will use statistical packages such as
R more extensively, and you will go deeper into the theory.
::: {.sfu data-latex=""}
**Related SFU courses**
The second required course in the econometrics sequence for SFU economics
majors is
[ECON 333:Statistical Analysis of Economic Data](https://www.sfu.ca/students/calendar/2022/fall/courses/econ/333.html).
In ECON 333, you will learn a more advanced set of techniques including linear
regression analysis and you will gain more experience with R.
We also offer several related electives. If you find you enjoy and/or do well in
this course, I would encourage you to take some of these course:
- [ECON 334: Data Visualization and Economic Analysis](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/334.html)
is an elective focusing on exploratory data analysis and visualization.
- [ECON 335: Introduction to Causal Inference and Policy Evaluation](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/335.html)
is an elective focusing on the problem of inferring cause-and-effect from
economic data, and using data to forecast the effects of economic policies.
- [ECON 433: Financial and Time Series Econometrics](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/433.html)
is an advanced elective focusing on techniques for analyzing the kind of time
series data that is used in macroeconomics and financial markets.
- [ECON 435: Econometric Methods](https://www.sfu.ca/students/calendar/2021/spring/courses/econ/435.html)
is an advanced course in statistics and econometrics that is part of our
honours sequence. It gives you the opportunity and tools to write a serious
empirical research paper. Non-honours students are eligible to take it if
they have a 3.0 CGPA and the course prerequisites.
I would also encourage you to take courses outside of the economics department,
and to consider a
[Statistics minor](https://www.sfu.ca/students/calendar/2021/spring/programs/statistics/minor.html)
or the interdisciplinary
[Social Data Analytics (SDA) minor](https://www.sfu.ca/students/calendar/2021/spring/programs/social-data-analytics/minor.html).
:::
## Computer resources {#computer-resources}
To do the computer work you will need a computer with internet access and
the following software applications installed:
1. Microsoft Excel
2. R
3. RStudio
You will also want to install an R "package" known as the Tidyverse.
Excel is a commercial application, while R and RStudio are both open-source
(free). They are available for both Windows and macOS. The examples in the
textbook use Windows.
We will use Excel [right away](#basic-data-cleaning-with-excel). We will
not need R until [much later](#an-introduction-to-r), but it is best
to get your system set up now.
:::sfu
**Canvas**
Course materials including the syllabus, lectures, and assignments are
available at the course Canvas site https://canvas.sfu.ca/courses/73884.
:::
### Installing Microsoft Excel {#installing-microsoft-excel}
Microsoft Excel is a well-known spreadsheet program that is available for both
Windows and macOS. Alternatives to Excel include
[Google Sheets](https://www.google.ca/sheets/about/) and Apple's
[Numbers](https://www.apple.com/ca/numbers/). We will use Excel, but the basic
skills are transferable to any spreadsheet program.
::: {.sfu data-latex=""}
**Using Excel at SFU**
SFU has a licensing agreement with Microsoft that allows its students
free installation of the entire Microsoft Office suite, including Excel.
Installation instructions are available at:
https://www.sfu.ca/itservices/technical/software/office365.html.
Excel is also available in all campus computer labs.
:::
Once you have installed Excel, you should confirm that it is working by starting
the program. You should see something that looks like this:

### Installing R and RStudio {#installing-r-and-rstudio}
We will also use a more specialized statistical program called R, and a related
application called RStudio. R is a programming language used for statistical
analysis, and the application that interprets and executes R code. RStudio is
an "Integrated Development Environment" (IDE) for R. An IDE is a collection of
tools tools for building and running programs, packaged up in a single
user-friendly interface.
Alternatives to R include the (quite expensive) statistical packages
[SAS](https://www.sas.com/) and [Stata](https://www.stata.com/), as well as the
(free) general-purpose programming language [Python](https://www.python.org/).
The best known IDE is Microsoft's
[Visual Studio](https://visualstudio.microsoft.com/).
::: {.sfu data-latex=""}
**Using R/RStudio at SFU**
R and RStudio are available in all campus computer labs.
You can also install them on your own computer by following the instructions
below.
:::
Both R and RStudio are open-source, and are available free of charge for both
Windows and macOS. Installation instructions are available at:
https://rstudio.com/products/rstudio/download/#download.
Be sure to install R first, then RStudio.
After installing R and RStudio, you should confirm that they are working by
opening RStudio. You should see something like this:

### Installing the Tidyverse {#installing-the-tidyverse}
One of the most useful features of R is that it allows users to write and
distribute ***packages*** that extend its capabilities. There are thousands
of packages available; we will use a particularly popular and useful package
called the ***Tidyverse***.
To install the Tidyverse package:
1. Open RStudio if it isn't already open.
2. Click in the `Console` window (you will see it towards the bottom of the
screen).
3. Enter `install.packages("tidyverse")` (i.e., type it and hit the `<enter>`
key).
Once the installation is concluded and the `>` prompt reappears you can test to
make sure the installation worked.
4. Enter `library(tidyverse)` in the console window.
- If you get an error message like
```Error in library(tidyverse) : there is no package called ‘tidyverse’```
you may have forgotten to install the package; go back to step 2 above.
- If you don't get an error message (it's OK if you get some message about
"Conflicts"), the installation worked.
If you run into trouble here, don't worry. We will not need the Tidyverse for
a few weeks, so there is plenty of time to get help.