Grouping operations

Grouping operations

Three approaches are possible for grouping.

Groupby

@groupby(d, by, x)

Group data and apply some summary function to it. Symbols in expression x are replaced by the respective column in d. In this context, _ refers to the whole table d. To use actual symbols, escape them with ^, as in ^(:a).

The second argument is optional (defaults to Keys()) and specifies on which column(s) to group. The key column(s) can be accessed with _.key. Use {} syntax for automatically named NamedTuples. Use cols(c) to refer to column c where c is a variable that evaluates to a symbol. c must be available in the scope where the macro is called.

Examples

julia> t = table([1,2,1,2], [4,5,6,7], [0.1, 0.2, 0.3,0.4], names = [:x, :y, :z]);

julia> @groupby t :x {maximum(:y - :z)}
Table with 2 rows, 2 columns:
x  maximum(y - z)
─────────────────
1  5.7
2  6.6

julia> @groupby t :x {m = maximum(:y - :z)/_.key.x}
Table with 2 rows, 2 columns:
x  m
──────
1  5.7
2  3.3

When the summary function returns an iterable, use flatten=true to flatten the result:

julia> @groupby(t, :x, flatten = true, select = {:y+1})
Table with 4 rows, 2 columns:
x  y + 1
────────
1  5
1  7
2  6
2  8
source

Column-wise macros with grouping argument

Column-wise macros accept an optional grouping argument:

iris = loadtable(Pkg.dir("JuliaDBMeta", "test", "tables", "iris.csv"))
@where_vec iris :Species :SepalLength .> mean(:SepalLength)

Use flatten=true to flatten the result:

@where_vec iris :Species flatten=true :SepalLength .> mean(:SepalLength)

Pipeline with grouping argument

@apply also accepts an optional grouping argument:

@apply iris :Species flatten = true begin
   @map {:SepalWidth, Ratio = :SepalLength / :SepalWidth}
   sort(_, :SepalWidth, rev = true)
   _[1:3]
end