Releases: mcaceresb/stata-gtools
gtools-1.11.8
Release update. Minor bug fixes and features for main functions. Several updates and major bug fixes to beta functions.
Features
gstats tabaddsformatOutput()to get a matrix with outptu formatted.
Bug fixes
- Allow
var,sd,semean,cvwithawand weights that add up to 1
(previously if weights added up to 1 the function exited, but withaw
the bias adjustment is based on the number of observations, not the sum). - Allow replace with
fasterxtileandgegen xtile gegennow accepts quotes in expressionsfasterxtilenow takes init
Beta Features
WARNING: Beta Features have not been extensively tested and are not recommended.
greg, saveconssavesrss,tss,r2,consestwithout
having to specifyalphasgregcan now saverss,tss,r2,consestwithby()gregsavesrss,tss,r2in mata (not withby())gregsavesconsestin mata with absorb (not withby())
Beta Bug Fixes
WARNING: Beta Features have not been extensively tested and are not recommended.
gregnjabsorbfills correctly withby(pointer was not incrementing in fun)gregwithalphas()savesconsestwith IV and only one absvar.gregwithglmand absorb no longer gives error (can't dereference NULL)gregcomputesrss,tssinternally with by (previous version would not
work because the error term was not saved by group, it was overwritten; now
the full residual vector is saved).gregressno longer forces diagonal vcov matrix (!)gregressgets header when displaying resultsgregressaddsalphas()to save individual fixed effects;predict()
gives right prediction across functions.
gtools-1.10.1
Release update. New commands and functions, several enhancements, and various bug fixes. Remember to run gtools, upgrade to keep up to date between major updates.
Features
-
New function
gstats transform(weights, by allowed):
Applies a transformation to a variable; that is,y_i = f(x_i)with
ythe target andxthe source. For example,gstats transform (demean) y = x, by(group)gives
n_j = sum_i 1{group_i = j} s_j = sum_i 1{group_i = j} * x_i y_ij = x_i - s_j / n_javailable:
normalize, standardize: f(x) = (x - mean(x)) / sd(x) demean: f(x) = (x - mean(x)) demedian: f(x) = (x - median(x)) cumsum: f(x_i) = sum_{l = 1}^i x_l shift: f(x_i) = x_{i - lag} or x_{i + lead} rank: Similar to egen rank; see docs. moving: Moving statistics; see docs. range: Similar to rangestat; see docs. -
gstats range: Alias forgstats transform (range); see below. -
gstats moving: Alias forgstats transform (moving); see below. -
gstats hdfe(aliasgstats residualize): Residualize variable
by absorbing high-dimensional fixed effects.-
Currenty in beta! Use with care; see docs for details.
-
Methods
cg(Conjugate Gradient),squarem(SQUAREM),it
(Irons and Tuck),map(Method of Alternating Projections). -
Parallel execution of select functions can be enabled at compile
time viaGTOOLSOMP
-
-
gstats transform:-
gstats transform (demean) ... -
gstats transform (demedian) ... -
gstats transform (normalize) ... -
gstats transform (cumsum [+/- [varname]]) ...: Sums in current order by default. User can request cum sum in ascending or descending order; last, the order can be determined by another variable. -
gstats transform (rank) ...: Optionties()specifies how to break ties (field, track, unique, stableunique). -
gstats transform (shift [+/-]#) ...: Leads (default; e.g.shift 1orshift +3) and lags (e.g.shift -2). -
gstats transform (moving stat lower upper) ...: Moving statisticstatfrom current observation +loweruntil current observation +upper; see docs for details. -
gstats transform (range stat lower upper varname) ...: Moving statisticstatfor values of varname in range
varname[_n] - lowertovarname[_n] + upper. Can also specify a
statistic, e.g.range sd -1.0sd 1.0sd varnameto get all values
within a standard deviation ofvarname[_n]. See docs for detauls. -
gstats transform, auto[()]allows automagically naming
targets based on the source variable's name and the statistic
requested. Default is#source#_#stat#.
-
-
greshape-
Adds option
dropmissto drop missing rows (case-wise) when
reshaping long (vialongorgather). -
Closes #58; allows
uselabels[(varlist, [exclude])]to optionally
specify which variables to use labels for (default is all
variables). The user can also specify the optionexcludeto
specify which variables not to do this for. -
Closes #63:
greshape wide/gatherallowsprefix(...)for custom
output names. -
Closes #69.
greshape wide/spreadnow allowslabelformat()for
custom variable labels (only when a single variable is passed to
key()/j()). The default is#keyvalue# #stublabel#. Available
placeholders are#stubname#,#stublabel#,#keyname#,
#keylabel#,#keyvalue#, and#keyvaluelabel#
-
-
gegen:-
winsor,winsorizecallgstats winsor -
standardize,normalize,demean,demediancallgstats transform -
Fixes #67; adds
gegen x = rank(varname) [wgt], by(varlist) ties(type)
viagstats transform (rank) [wgt], by() ties(). Weights are optional. -
gegen x = moving_stat(y), window(lower upper)callsgstats transform -
gegen x = range_stat(y), interval(lower[stat] upper[stat] varname)callsgstats transform
-
-
gcollapse,gegen,gstats tabnew functions:-
geomeanfor geometric mean. -
gini,gini dropneg,gini keepnegfor gini coefficient
(optionally drop or keep negative values).
-
-
noinitoption forgcollapse, merge,gegen,gstats(selected),
gregress(and co.) to prevent targets from being emptied out
withreplace. Prints warning!
Beta
- Regression models are in beta and not recommended at the moment;
see docs for details.
Enhancements
-
User must now specify global
GTOOLS_BETAto use beta features. -
Typed (direct/non-hashed) radix sort in API internals
-
Allows the user to specify the temporary directory for files via
global GTOOLS_TEMPDIR -
gunique, detailnow usesgstats sum, detail -
Modularized the code base so that aliases are assigned to internal functions instead of the copy/paste if/else branching statements.
-
Categorize documentation into "Data manipulation", "Statistics",
and "Regression models". -
Move plugin compilation to GitHub rather than Travis.
-
gtopprints the number of levels in Other and Missing rows by
default. (With missing it only does it if there's more than
one type of missing value.) -
greshapetries to detect repeated stubs and suggests this possibility
to the user when a stub matches multiple variables. -
Faster excludeself mean and sum without specified range in
gstats transform.
Bug Fixes
-
gstats winsor, exits with error if replace and if/in are passed
(the way it's set up it'd be a bit of a hassle to allow init/noinit). -
gstats transform,gstats hdfe,gregress(and co.) all now
initialize their targets to be empty (missing values) with
if in and replace. -
gtopno longer incorrectly replaces the display value if the
numerical variable has a value label and no missing values. If there
was a single value this would result in an error:gtopwould think
there was always at least one missing value to replace. -
gcollapseno longer fails when trying to label the collapsed output
if the source labels are blank (this can happen for example with data
transformed to.dtafrom other formats or programs). -
gcollapseno longer gives incorrect missing variables list when
part of that list is called with varlist notation (e.g.x* y
andx*exist butydoes not). -
guniqueno longer ignores if/in withgenandreplace -
Fixed
gegen nuniquewith multiple inputs -
Fixed bug where the prefix in
gstatswasstat_instead ofstat| -
In
gquantiles, data was read incorrectly withby()andweights
ifxtilewas not requested. In particular, the data was copied as if
the target had only one column, but since weights need to be included,
the target has two columns. This was fixed. -
Fixed bug where a by variable being used as a source but not a
target got renamed to the target and was no longer available as a by
variable. Now a new variable should be created and the by variable
remains unchanged. -
Fixed memory leak where the C by variables were not cleared from memory
ofst_into->outputwas allocated because free code was upgraded from 6
or 7 to 9. Conditional logic in place said that by variables should not
be cleared if free code was greater than 7, but that was only meant to
skip free code 8 and free code 9 in some scripts, but not all. Code 8
logic was deprecated and now by variables are allocated with code 8, so
they are always clared if free code is 6 or higher. -
by: gegennow generates variables using thebyprefix.
This would give incorrect answers if the expression inside
egen assumed that it would be generated withby. For example
by var: gegen x = mean(max(y, y[1])) -
Closes #64: Removes
headcommand fromgreshapetests (done a few
commits ago but someone noticed before the merge). -
Closes #68.
gegennow allowsby:prefix when calling a
gstats transformfunction (this is only allowed because these calls
already require single-variable input, so theby:prefix should not
present an issue when calling the function). -
Closes #72: Warning for gegen expressions without by group
-
Closes #74: gstats transform parses abbreviated targets
-
Closes #75: gunique returns 0s in r() when there are no obs
-
Closes #78: if now passed raw/in double-quotes throughout the pipeline
-
Closes #79: Adds disclaimer to benchmarks.
-
Closes #82:
cwingcollapsenow working. -
Closes #85: Bug in
gegenwarning message causes errors in some fun calls. -
Closes #87: For OSX, make now compiles x86_64 and arm64 separately
then combines vialipo. -
Various fixes to the docs.
gtools-1.5.1
Release update. New commands, major features, and various bug fixes. Remember to run gtools, upgrade to keep up to date between major updates.
New Commands
-
gstats winsoris a fast, by-ablewinsor2alternative for Winsorizing and trimming data (accepts weights). -
greshape longandgreshape wideare a fast alternative to reshape. -
greshape spreadandgreshape gatherare analogous to thespreadandgathercommands from R'stidyr. -
gstats sumandgstats tab(aliasgstats summarizeandgstats tabstat) are a fast, by-able alternative tosum, detailandtabstat
Enhancements and Features
-
gstats sumorgstats tabwith optionmatasave; this stores the output and by levels inGstatsOutput(custom naming viamatasave(name)), an object of classGtoolsResults. -
gcollapseandgegennow allow the stats:select#andselect-#, for the#th smallest or largest value, respectively.rawselect#andrawselect-#, ibid but ignoring weights.cv, coefficient of variation,sd/meanvariancerange,max-min
-
greshapefeatures- Preferred syntax is
by()andkeys()instead ofi()andj(); the docs and most of the printouts reflect this. greshapetries to save variable labels, notes, and characteristics when reshaping.greshape, uselabelsallows the user to save the source variable labels as levels instead of their names.greshapesupports @ syntax.greshape wideadditionally supports varlist syntax (but the same stub cannot have both@and a varlist).greshape longdoes not support varlist syntax, but the user can pass regexes as stubs with the optionmatch(regex). See the documentation for details.
- Preferred syntax is
-
glevelsofandgtopfeaturesglevelsofandgtopboth take optionmatasave(ormatasave(name)) to save the variable levels in a mata object (default name isGtoolsByLevels).- With option
matasave[(name)],r(levels)is not returned; the levels are stored inprintedas part of the mata return object (e.g.GtoolsByLevels.printed). The user can save only the raw levels by also adding thesilentoption. - With option
matasave[(name)], bothgtop, numfmt()andglevelsof, numfmt()do the number formatting in mata, sonumfmt()must pass a mata print format instead of a C print format (they are very similar, however). - With option
matasave[(name)],gtopdoes not returnr(toplevels)either. The frequency table is stored intoplevelsas part of the mata return object (e.g.GtoolsByLevels.toplevels). gtop, ntop(.)prints all the levels from largest to smallest;gtop, ntop(-.)prints from smallest to largest;gtop, alphaprints the largest/smallestntop()levels sorted in variable order (e.g. alphabetically or numerically, depending on the variable type).gtopalso storesr(ntop),r(nrows), andr(alpha)as return scalars; ifntop(.)orntop(-.)are passed,r(ntop)will just ber(J).- Both
gtopandglevelsofshould handle embedded characters better. Printing is still a problem but they get copied to the return values properly.
-
gstatsis a general-purpose wrapper for misc functions. -
lgtools.mlibadded with come pre-compiled mata functios. -
Any function that allows results to be saved in mata allow the mata object to call
.desc()to get more info on the object. -
Faster hash sort with integer bijection (two-pass radix sorts for smaller integers; undocumented option
_ctolerance()allows the user to force the regular counting sort). -
Faster index copy when every observation is read (simply assign the index pointer to
st_info->index)
Bug Fixes
-
Stata 14.0 no longer tries to load SPI version 3 (loads version 2).
-
SpookyHash code compiled directly as part of the plugin. Might fix #35 (deleted all ancillary files and code related to
spookyhash.dll). -
gtop,glevelsof, andgcontractparse wildcards before adding any temporary variables, ensuring the latter don't get included in internal function calls. -
Removed locale as a dependency; comma printing done manually. This fixes a bug where in certain systems, locale would get reset and cause some internal Stata numbers fo interpret decimals via comma, that is,
95.0would become95,0and cause problems down the line. -
Minor bug fix in
gtop; inverted levels were not correctly sorted with weights. The levels themselves were OK, however. -
gcollapseno longer crashes whenrawstatdoes not match any entries.
gtools-1.1.2
Release update. Various bug fixes and minor improvements. Remember to run gtools, upgrade to keep up to date between major updates.
Enhancements and Features
-
Improved variable parsing in general (including '-' handling).
-
gcollapse (nmissing)counts the number of missing values (weights allowed). -
gcollapse (nansum)andgcollapse (rawnansum)preserve missing value (NaN) information: If all entries are missing, the output is also missing (instead of 0). This is a more flexible version of the previous implementation,gcollapse, missing. -
gcollapse, mergeandgegennow accept the undocumented option_subtractto subtract the result from the source variable. This is an option meant for advanced users, so no additional checks have been added. If you use it then you know what you're doing. -
Added option
sumcheckto create sum targets from integer source as the smallest type that reasonable given the total sum.
Bug fixes
-
gisidno longer gives wrong results when the data is partially ordered. Partially (weakly) sorted data in prior version was incorrectly counted as totally sorted. -
gcollapse (rawsum)gives 0 if all entries are missing. -
gcollapseandgegencorrectly parse types with weights for counts and sums. This includesgcollapse, sumcheck -
Recast upgrade (bug fix from 1.0.5) now done per-variable.
-
gcollapseno longer gives wrong results whencountis the first of multiple stats requested for a given source variable. Previous versions wrongly recast the source as long. -
Gtools exits with error if
_N > 2^31-1and points the user to the pertinent bug report.
gtools-1.0.1
First official release. Various bug fixes and minor improvements. gtools can now upgrade itself via gtools, upgrade and run test scripts via gtools, test.
gtools-rc3
Feature freeze! Third release candidate.
- Added partial
strLvariable support (not binary data; see optionscompressandforcestrl) - Added
gduplicatesas aduplicatesreplacement. - Added option
mlasttohashsortto recover the default behavior ofgsort. fasterxtileandgquantilesnow accept weights (includingby())gtop(andgtoplevelsof) now accept weights.gduplicatesnow accepts weights.glevelsofnow accepts optionnolocalto skip saving the levels to a local variable; the levels can be stored in a variable list via the optiongen().
Once tests are passing from this tag I will submit version 1.0 to SSC.
gtools-rc2
Second release candidate. Added skew (Skewness) and kurt (Kurtosis) to gcollapse and gegen; added rawsum and rawstat() to selectively apply weights to gcollapse targets. Added basic debugging info to the code base and improved the comments in-code. Once I improve the test coverage I should be ready to submit to SSC.
gtools-rc1
First release candidate for submission to SSC. Now that weights have been added, no major new functionality will appear (though some minor features might be added, and bug fixes will, of course, be incorporated).