
- Revision: the standalone
count command is replaced with len, so make sure to replace (count) and col "count" with len and col "len" respectively.
- the unary
count <col> command is unaffected.
dfsql --input your.csv --output a-new.csv
# ...or
dfsql -i your.csv -o a-new.csv
exit/quit: exit the REPL loop.
undo: undo the previous successful operation.
reset: reset all the changes and go back to the original data frame.
schema: show column names and types of the data frame.
save: save the current data frame to a file.
select
select last_name first_name
- Select columns "last_name" and "first_name" and collect them into a data frame.
- Group by
group (<col> | <var>)* agg <expr>*
group first_name agg (count)
- Group the data frame by column "first_name" and then aggregate each group with the count of the members.
filter
filter first_name = "John"
limit
reverse
sort
sort ((asc | desc | ()) <col>)*
use
- Switch to the data frame called
other.
- join
(left | right | inner | full) join <var> on <col> <col>?
- left join the data frame called
other on my column id and its column ID
col: reference to a column.
col : (<str> | <var>) -> <expr>
exclude: remove columns from the data frame.
exclude : <expr>* -> <expr>
select exclude last_name first_name
- literal: literal values like
42, "John", 1.0, and null.
- binary operations
- Calculate the product of columns "a" and "b" and collect the result.
- unary operations
- Sum all values in column "a" and collect the scalar result.
alias: assign a name to a column.
alias : (<col> | <var>) <expr> -> <expr>
select alias product a * b
- Assign the name "product" to the product and collect the new column.
- conditional
<conditional> : if <expr> then <expr> (if <expr> then <expr>)* otherwise <expr> -> <expr>
select if class = 0 then "A" if class = 1 then "B" else null
cast: cast a column to either type str, int, or float.
cast : <type> <expr> -> <expr>
- Cast the column "id" to type
str and collect the result.