Skip to content

Commit cc066df

Browse files
pdeffebachnalimilanbkamins
authored
Another attempt at an astable flag (#298)
* initial attempt * finally working * start adding tests * more tests * more tests * add docstring * tests pass * add ByRow in docstring * add type annotation * better docs * more docs fixes * update index.md * Apply suggestions from code review Co-authored-by: Milan Bouchet-Valat <[email protected]> * clean named tuple creation * add example with string * grouping tests * Update src/macros.jl Co-authored-by: Bogumił Kamiński <[email protected]> * changes * fix some errors * add macro check * add errors for bad flag combo * better grouping tests * Update src/parsing_astable.jl Co-authored-by: Milan Bouchet-Valat <[email protected]> * add snipper to transform, select, combine, by * add mutating tests * get rid of debugging printin * Apply suggestions from code review Co-authored-by: Milan Bouchet-Valat <[email protected]> Co-authored-by: Milan Bouchet-Valat <[email protected]> Co-authored-by: Bogumił Kamiński <[email protected]>
1 parent 6ba85a7 commit cc066df

11 files changed

+539
-43
lines changed

Project.toml

+2-1
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,15 @@ version = "0.9.1"
66
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
77
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
88
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
9+
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
910
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
1011

1112
[compat]
13+
Chain = "0.4"
1214
DataFrames = "1"
1315
MacroTools = "0.5"
1416
Reexport = "0.2, 1"
1517
julia = "1"
16-
Chain = "0.4"
1718

1819
[extras]
1920
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"

docs/src/index.md

+30-2
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ In addition, DataFramesMeta provides
2222
convenient syntax.
2323
* `@byrow` for applying functions to each row of a data frame (only supported inside other macros).
2424
* `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations.
25+
* `@astable` to create multiple columns within a single transformation.
2526
* `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s
2627
`%>%` in R.
2728

@@ -396,11 +397,38 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str)
396397
3missing missing
397398
```
398399

400+
## Creating multiple columns at once with `@astable`
401+
402+
Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple
403+
new variables in the same operation, yet have them share
404+
information.
405+
406+
In a single block, all assignments of the form `:y = f(:x)`
407+
or `$y = f(:x)` at the top-level generate new columns. In the second example, `y`
408+
must be a string or `Symbol`.
409+
410+
```
411+
julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]);
412+
413+
julia> @transform df @astable begin
414+
ex = extrema(:b)
415+
:b_first = :b .- first(ex)
416+
:b_last = :b .- last(ex)
417+
end
418+
3×4 DataFrame
419+
Row │ a b b_first b_last
420+
│ Int64 Int64 Int64 Int64
421+
─────┼───────────────────────────────
422+
1 │ 1 400 0 -200
423+
2 │ 2 500 100 -100
424+
3 │ 3 600 200 0
425+
```
426+
427+
399428
## [Working with column names programmatically with `$`](@id dollar)
400429

401430
DataFramesMeta provides the special syntax `$` for referring to
402-
columns in a data frame via a `Symbol`, string, or column position as either
403-
a literal or a variable.
431+
columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable.
404432

405433
```julia
406434
df = DataFrame(A = 1:3, B = [2, 1, 2])

src/DataFramesMeta.jl

+4-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ using Reexport
44

55
using MacroTools
66

7+
using OrderedCollections: OrderedCollections
8+
79
@reexport using DataFrames
810

911
@reexport using Chain
@@ -16,12 +18,13 @@ export @with,
1618
@transform, @select, @transform!, @select!,
1719
@rtransform, @rselect, @rtransform!, @rselect!,
1820
@eachrow, @eachrow!,
19-
@byrow, @passmissing,
21+
@byrow, @passmissing, @astable,
2022
@based_on, @where # deprecated
2123

2224
const DOLLAR = raw"$"
2325

2426
include("parsing.jl")
27+
include("parsing_astable.jl")
2528
include("macros.jl")
2629
include("linqmacro.jl")
2730
include("eachrow.jl")

src/macros.jl

+164-24
Original file line numberDiff line numberDiff line change
@@ -282,11 +282,10 @@ macro byrow(args...)
282282
throw(ArgumentError("@byrow is deprecated outside of DataFramesMeta macros."))
283283
end
284284

285-
286285
"""
287-
passmissing(args...)
286+
@passmissing(args...)
288287
289-
Propograte missing values inside DataFramesMeta.jl macros.
288+
Propagate missing values inside DataFramesMeta.jl macros.
290289
291290
292291
`@passmissing` is not a "real" Julia macro but rather serves as a "flag"
@@ -350,6 +349,156 @@ macro passmissing(args...)
350349
throw(ArgumentError("@passmissing only works inside DataFramesMeta macros."))
351350
end
352351

352+
const astable_docstring_snippet = """
353+
Transformations can also use the macro-flag [`@astable`](@ref) for creating multiple
354+
new columns at once and letting transformations share the same name-space.
355+
See `? @astable` for more details.
356+
"""
357+
358+
"""
359+
@astable(args...)
360+
361+
Return a `NamedTuple` from a single transformation inside the DataFramesMeta.jl
362+
macros, `@select`, `@transform`, and their mutating and row-wise equivalents.
363+
364+
`@astable` acts on a single block. It works through all top-level expressions
365+
and collects all such expressions of the form `:y = ...` or `$(DOLLAR)y = ...`, i.e. assignments to a
366+
`Symbol` or an escaped column identifier, which is a syntax error outside of
367+
DataFramesMeta.jl macros. At the end of the expression, all assignments are collected
368+
into a `NamedTuple` to be used with the `AsTable` destination in the DataFrames.jl
369+
transformation mini-language.
370+
371+
Concretely, the expressions
372+
373+
```
374+
df = DataFrame(a = 1)
375+
376+
@rtransform df @astable begin
377+
:x = 1
378+
y = 50
379+
:z = :x + y + :a
380+
end
381+
```
382+
383+
become the pair
384+
385+
```
386+
function f(a)
387+
x_t = 1
388+
y = 50
389+
z_t = x_t + y + a
390+
391+
(; x = x_t, z = z_t)
392+
end
393+
394+
transform(df, [:a] => ByRow(f) => AsTable)
395+
```
396+
397+
`@astable` has two major advantages at the cost of increasing complexity.
398+
First, `@astable` makes it easy to create multiple columns from a single
399+
transformation, which share a scope. For example, `@astable` allows
400+
for the following (where `:x` and `:x_2` exist in the data frame already).
401+
402+
```
403+
@transform df @astable begin
404+
m = mean(:x)
405+
:x_demeaned = :x .- m
406+
:x2_demeaned = :x2 .- m
407+
end
408+
```
409+
410+
The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`,
411+
which does not need to be calculated twice.
412+
413+
Second, `@astable` is useful when performing intermediate calculations
414+
and storing their results in new columns. For example, the following fails.
415+
416+
```
417+
@rtransform df begin
418+
:new_col_1 = :x + :y
419+
:new_col_2 = :new_col_1 + :z
420+
end
421+
```
422+
423+
This because DataFrames.jl does not guarantee sequential evaluation of
424+
transformations. `@astable` solves this problem
425+
426+
@rtransform df @astable begin
427+
:new_col_1 = :x + :y
428+
:new_col_2 = :new_col_1 + :z
429+
end
430+
431+
Column assignment in `@astable` follows similar rules as
432+
column assignment in other DataFramesMeta.jl macros. The left-
433+
-hand-side of a column assignment can be either a `Symbol` or any
434+
expression which evaluates to a `Symbol` or `AbstractString`. For example
435+
`:y = ...`, and `$(DOLLAR)y = ...` are both valid ways of assigning a new column.
436+
However unlike other DataFramesMeta.jl macros, multi-column assignments via
437+
`AsTable` are disallowed. The following will fail.
438+
439+
```
440+
@transform df @astable begin
441+
$AsTable = :x
442+
end
443+
```
444+
445+
References to existing columns also follow the same
446+
rules as other DataFramesMeta.jl macros.
447+
448+
### Examples
449+
450+
```
451+
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
452+
453+
julia> d = @rtransform df @astable begin
454+
:x = 1
455+
y = 5
456+
:z = :x + y
457+
end
458+
3×4 DataFrame
459+
Row │ a b x z
460+
│ Int64 Int64 Int64 Int64
461+
─────┼────────────────────────────
462+
1 │ 1 4 1 6
463+
2 │ 2 5 1 6
464+
3 │ 3 6 1 6
465+
466+
julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);
467+
468+
julia> @by df :a @astable begin
469+
ex = extrema(:b)
470+
:min_b = first(ex)
471+
:max_b = last(ex)
472+
end
473+
2×3 DataFrame
474+
Row │ a min_b max_b
475+
│ Int64 Int64 Int64
476+
─────┼─────────────────────
477+
1 │ 1 5 6
478+
2 │ 2 70 80
479+
480+
julia> new_col = "New Column";
481+
482+
julia> @rtransform df @astable begin
483+
f_a = first(:a)
484+
$(DOLLAR)new_col = :a + :b + f_a
485+
:y = :a * :b
486+
end
487+
4×4 DataFrame
488+
Row │ a b New Column y
489+
│ Int64 Int64 Int64 Int64
490+
─────┼─────────────────────────────────
491+
1 │ 1 5 7 5
492+
2 │ 1 6 8 6
493+
3 │ 2 70 74 140
494+
4 │ 2 80 84 160
495+
```
496+
497+
"""
498+
macro astable(args...)
499+
throw(ArgumentError("@astable only works inside DataFramesMeta macros."))
500+
end
501+
353502
##############################################################################
354503
##
355504
## @with
@@ -1097,6 +1246,8 @@ transformations by row, `@transform` allows `@byrow` at the
10971246
beginning of a block of transformations (i.e. `@byrow begin... end`).
10981247
All transformations in the block will operate by row.
10991248
1249+
$astable_docstring_snippet
1250+
11001251
### Examples
11011252
11021253
```jldoctest
@@ -1233,6 +1384,8 @@ transform!ations by row, `@transform!` allows `@byrow` at the
12331384
beginning of a block of transform!ations (i.e. `@byrow begin... end`).
12341385
All transform!ations in the block will operate by row.
12351386
1387+
$astable_docstring_snippet
1388+
12361389
### Examples
12371390
12381391
```jldoctest
@@ -1345,6 +1498,8 @@ transformations by row, `@select` allows `@byrow` at the
13451498
beginning of a block of selectations (i.e. `@byrow begin... end`).
13461499
All transformations in the block will operate by row.
13471500
1501+
$astable_docstring_snippet
1502+
13481503
### Examples
13491504
13501505
```jldoctest
@@ -1465,6 +1620,8 @@ transformations by row, `@select!` allows `@byrow` at the
14651620
beginning of a block of select!ations (i.e. `@byrow begin... end`).
14661621
All transformations in the block will operate by row.
14671622
1623+
$astable_docstring_snippet
1624+
14681625
### Examples
14691626
14701627
```jldoctest
@@ -1546,17 +1703,6 @@ function combine_helper(x, args...; deprecation_warning = false)
15461703

15471704
exprs, outer_flags = create_args_vector(args...)
15481705

1549-
fe = first(exprs)
1550-
if length(exprs) == 1 &&
1551-
get_column_expr(fe) === nothing &&
1552-
!(fe.head == :(=) || fe.head == :kw)
1553-
1554-
@warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS."
1555-
1556-
lhs = Expr(:$, :AsTable)
1557-
exprs = ((:($lhs = $fe)),)
1558-
end
1559-
15601706
t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)
15611707

15621708
quote
@@ -1592,6 +1738,8 @@ and
15921738
@combine(df, :mx = mean(:x), :sx = std(:x))
15931739
```
15941740
1741+
$astable_docstring_snippet
1742+
15951743
### Examples
15961744
15971745
```julia
@@ -1666,16 +1814,6 @@ end
16661814
function by_helper(x, what, args...)
16671815
# Only allow one argument when returning a Table object
16681816
exprs, outer_flags = create_args_vector(args...)
1669-
fe = first(exprs)
1670-
if length(exprs) == 1 &&
1671-
get_column_expr(fe) === nothing &&
1672-
!(fe.head == :(=) || fe.head == :kw)
1673-
1674-
@warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS."
1675-
1676-
lhs = Expr(:$, :AsTable)
1677-
exprs = ((:($lhs = $fe)),)
1678-
end
16791817

16801818
t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)
16811819

@@ -1718,6 +1856,8 @@ and
17181856
@by(df, :g, mx = mean(:x), sx = std(:x))
17191857
```
17201858
1859+
$astable_docstring_snippet
1860+
17211861
### Examples
17221862
17231863
```julia

0 commit comments

Comments
 (0)