Pipeline macros

Pipeline macros

All macros have a currified version, so they can be easily concatenated using |>. For example:

julia> t = table([1,2,1,2], [4,5,6,7], [0.1, 0.2, 0.3,0.4], names = [:x, :y, :z]);

julia> t |> @where(:x >= 2) |> @transform({:x+:y})
Table with 2 rows, 4 columns:
x  y  z    x + y
────────────────
2  5  0.2  7
2  7  0.4  9

To avoid the parenthesis and to use the _ curryfication syntax, you can use the @apply macro instead:

@apply(args...)

Concatenate a series of operations. Non-macro operations from JuliaDB, are supported via the _ curryfication syntax. A second optional argument is used for grouping:

julia> t = table([1,2,1,2], [4,5,6,7], [0.1, 0.2, 0.3,0.4], names = [:x, :y, :z]);

julia> @apply t begin
          @where :x >= 2
          @transform {:x+:y}
          sort(_, :z)
       end
Table with 2 rows, 4 columns:
x  y  z    x + y
────────────────
2  5  0.2  7
2  7  0.4  9

julia> @apply t :x flatten=true begin
          @transform {w = :y + 1}
          sort(_, :w)
       end
Table with 4 rows, 4 columns:
x  y  z    w
────────────
1  4  0.1  5
1  6  0.3  7
2  5  0.2  6
2  7  0.4  8
source

Use @applychunked to apply your pipeline independently on different processors:

@applychunked(args...)

Split the table into chunks, apply the processing pipeline separately to each chunk and return the result as a distributed table.

julia> t = table([1,2,1,2], [4,5,6,7], [0.1, 0.2, 0.3,0.4], names = [:x, :y, :z], chunks = 2);

julia> @applychunked t begin
          @where :x >= 2
          @transform {:x+:y}
          sort(_, :z)
       end
Distributed Table with 2 rows in 2 chunks:
x  y  z    x + y
────────────────
2  5  0.2  7
2  7  0.4  9
source