title | teaching | exercises | questions | objectives | keypoints | ||||
---|---|---|---|---|---|---|---|---|---|
Stata style guide |
0 |
0 |
|
|
Questions
- How to name variables?
- What code style do we use?
Objectives
- Use verbose, helpful variable names.
- Make your code accessible to others.
Write save "data/worker.dta"
, not . The former works on all three major platforms, the latter only on Windows.save "data\worker.dta"
Write save "data/worker.dta"
and do "regression.do"
, not or save "data/worker"
. Even though some extensions are appended by Stata by default, it is better to be explicit to help future readers of your code.do "regression"
Write save "data/worker.dta"
and do "regression.do"
, not or save data/worker.dta
. Both are correct, but the first is more readable, as most editors readily highlight strings as separate from programming statements.do regression
Write save "../data/worker.dta"
, not . Nobody else will have the same absolute path as you have on your system. Adopt a convention of where you are running scripts from and make paths relative to that location.save "/Users/koren/Tresorit/research/data/worker.dta
Use generate ln_wage = ln(wage)
and summarize ln_wage, detail
, not or g ln_wage = ln(wage)
. Both will work, because Stata allows you abbreviation, but the former is more readable.su ln_wage, d
Use summarize ln_wage, detail
, not . Both will work, because Stata allows you abbreviation, but the latter is very error prone. In fact, you can turn off variable name abbreviation with sumarize ln_w, detail
set varabbrev off, permanent
.
Use egen mean_male_wage = mean(wage) if gender == "male"
, not . Your variables should be self documenting. Reserve variable labeling to even more verbose explanations, including units: egen w1 = mean(wage) if gender == "male"
label variable mean_male_wage "Average wage of male workers (2011 HUF)"
.
Use egen mean_male_wage = mean(wage) if gender == "male"
, not or egen meanmalewage = mean(wage) if gender == "male"
. The former is more readable. Transformations like mean, log should be part of the variable name.egen meanMaleWage = mean(wage) if gender == "male"
Use revenue
, not or revenue_USD
. Record this information in variable labels, though. You will change your code and your data and you don't want this detail to ruin your entire code.revenue_2017
If you have a foreach
loop with a few lines of code, it is fine to use a one-character variable name for indexing: foreach X of variable wage mean_male_wage {
. But if you have longer code and X
would pop up multiple times, give it a more verbose name.
If you are hard pressed against the 32-character limit of variable name length, use abbreviation that will be obvious to everyone seeing the code. Use generate num_customer
, not or generate number_of_customers_of_the_firm
.generate n_cust
Use generate ln_wage = ln(wage)
and count if gender == "male"
, not or generate ln_wage=ln(wage)
. The former is more readable.count if gender=="male"
Use assert inlist(gender, "male", "female")
not . The former is more readable.assert inlist(gender,"male","female")
foreach X of variable wage mean_male_wage {
summarize `X', detail
scalar `X'_median = r(p50)
}
not
foreach X of variable wage mean_male_wage {
summarize `X', detail
scalar `X'_median = r(p50)
}
Longer scripts are much more difficult to read and understand by others. If your script is longer, break it up into smaller components by creating several .do files and calling them.