Data Manipulation

Joins

Base.joinFunction.

Join two DataTables

join(dt1::AbstractDataTable,
     dt2::AbstractDataTable;
     on::Union{Symbol, Vector{Symbol}} = Symbol[],
     kind::Symbol = :inner)

Arguments

  • dt1, dt2 : the two AbstractDataTables to be joined

Keyword Arguments

  • on : a Symbol or Vector{Symbol}, the column(s) used as keys when joining; required argument except for kind = :cross

  • kind : the type of join, options include:

    • :inner : only include rows with keys that match in both dt1 and dt2, the default

    • :outer : include all rows from dt1 and dt2

    • :left : include all rows from dt1

    • :right : include all rows from dt2

    • :semi : return rows of dt1 that match with the keys in dt2

    • :anti : return rows of dt1 that do not match with the keys in dt2

    • :cross : a full Cartesian product of the key combinations; every row of dt1 is matched with every row of dt2

Null values are filled in where needed to complete joins.

Result

  • ::DataTable : the joined DataTable

Examples

name = DataTable(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"])
job = DataTable(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"])

join(name, job, on = :ID)
join(name, job, on = :ID, kind = :outer)
join(name, job, on = :ID, kind = :left)
join(name, job, on = :ID, kind = :right)
join(name, job, on = :ID, kind = :semi)
join(name, job, on = :ID, kind = :anti)
join(name, job, kind = :cross)
source

Reshaping

DataTables.meltFunction.

Stacks a DataTable; convert from a wide to long format; see stack.

source
DataTables.stackFunction.

Stacks a DataTable; convert from a wide to long format

stack(dt::AbstractDataTable, [measure_vars], [id_vars];
      variable_name::Symbol=:variable, value_name::Symbol=:value)
melt(dt::AbstractDataTable, [id_vars], [measure_vars];
     variable_name::Symbol=:variable, value_name::Symbol=:value)

Arguments

  • dt : the AbstractDataTable to be stacked

  • measure_vars : the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; for melt, defaults to all variables that are not id_vars. If neither measure_vars or id_vars are given, measure_vars defaults to all floating point columns.

  • id_vars : the identifier columns that are repeated during stacking, a normal column indexing type; for stack defaults to all variables that are not measure_vars

  • variable_name : the name of the new stacked column that shall hold the names of each of measure_vars

  • value_name : the name of the new stacked column containing the values from each of measure_vars

Result

  • ::DataTable : the long-format datatable with column :value holding the values of the stacked columns (measure_vars), with column :variable a Vector of Symbols with the measure_vars name, and with columns for each of the id_vars.

See also stackdt and meltdt for stacking methods that return a view into the original DataTable. See unstack for converting from long to wide format.

Examples

d1 = DataTable(a = repeat([1:3;], inner = [4]),
               b = repeat([1:4;], inner = [3]),
               c = randn(12),
               d = randn(12),
               e = map(string, 'a':'l'))

d1s = stack(d1, [:c, :d])
d1s2 = stack(d1, [:c, :d], [:a])
d1m = melt(d1, [:a, :b, :e])
d1s_name = melt(d1, [:a, :b, :e], variable_name=:somemeasure)
source
DataTables.unstackFunction.

Unstacks a DataTable; convert from a long to wide format

unstack(dt::AbstractDataTable, rowkey, colkey, value)
unstack(dt::AbstractDataTable, colkey, value)
unstack(dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable to be unstacked

  • rowkey : the column with a unique key for each row, if not given, find a key by grouping on anything not a colkey or value

  • colkey : the column holding the column names in wide format, defaults to :variable

  • value : the value column, defaults to :value

Result

  • ::DataTable : the wide-format datatable

Examples

wide = DataTable(id = 1:12,
                 a  = repeat([1:3;], inner = [4]),
                 b  = repeat([1:4;], inner = [3]),
                 c  = randn(12),
                 d  = randn(12))

long = stack(wide)
wide0 = unstack(long)
wide1 = unstack(long, :variable, :value)
wide2 = unstack(long, :id, :variable, :value)

Note that there are some differences between the widened results above.

source
DataTables.stackdtFunction.

A stacked view of a DataTable (long format)

Like stack and melt, but a view is returned rather than data copies.

stackdt(dt::AbstractDataTable, [measure_vars], [id_vars];
        variable_name::Symbol=:variable, value_name::Symbol=:value)
meltdt(dt::AbstractDataTable, [id_vars], [measure_vars];
       variable_name::Symbol=:variable, value_name::Symbol=:value)

Arguments

  • dt : the wide AbstractDataTable

  • measure_vars : the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; for melt, defaults to all variables that are not id_vars

  • id_vars : the identifier columns that are repeated during stacking, a normal column indexing type; for stack defaults to all variables that are not measure_vars

Result

  • ::DataTable : the long-format datatable with column :value holding the values of the stacked columns (measure_vars), with column :variable a Vector of Symbols with the measure_vars name, and with columns for each of the id_vars.

The result is a view because the columns are special AbstractVectors that return indexed views into the original DataTable.

Examples

d1 = DataTable(a = repeat([1:3;], inner = [4]),
               b = repeat([1:4;], inner = [3]),
               c = randn(12),
               d = randn(12),
               e = map(string, 'a':'l'))

d1s = stackdt(d1, [:c, :d])
d1s2 = stackdt(d1, [:c, :d], [:a])
d1m = meltdt(d1, [:a, :b, :e])
source
DataTables.meltdtFunction.

A stacked view of a DataTable (long format); see stackdt

source