Data Manipulation
Joins
Base.join
— Function.Join two DataTables
join(dt1::AbstractDataTable,
dt2::AbstractDataTable;
on::Union{Symbol, Vector{Symbol}} = Symbol[],
kind::Symbol = :inner)
Arguments
dt1
,dt2
: the two AbstractDataTables to be joined
Keyword Arguments
on
: a Symbol or Vector{Symbol}, the column(s) used as keys when joining; required argument except forkind = :cross
kind
: the type of join, options include::inner
: only include rows with keys that match in bothdt1
anddt2
, the default:outer
: include all rows fromdt1
anddt2
:left
: include all rows fromdt1
:right
: include all rows fromdt2
:semi
: return rows ofdt1
that match with the keys indt2
:anti
: return rows ofdt1
that do not match with the keys indt2
:cross
: a full Cartesian product of the key combinations; every row ofdt1
is matched with every row ofdt2
For the three join operations that may introduce missing values (:outer
, :left
, and :right
), all columns of the returned data table will be nullable.
Result
::DataTable
: the joined DataTable
Examples
name = DataTable(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"])
job = DataTable(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"])
join(name, job, on = :ID)
join(name, job, on = :ID, kind = :outer)
join(name, job, on = :ID, kind = :left)
join(name, job, on = :ID, kind = :right)
join(name, job, on = :ID, kind = :semi)
join(name, job, on = :ID, kind = :anti)
join(name, job, kind = :cross)
Reshaping
DataTables.melt
— Function.Stacks a DataTable; convert from a wide to long format; see stack
.
DataTables.stack
— Function.Stacks a DataTable; convert from a wide to long format
stack(dt::AbstractDataTable, [measure_vars], [id_vars];
variable_name::Symbol=:variable, value_name::Symbol=:value)
melt(dt::AbstractDataTable, [id_vars], [measure_vars];
variable_name::Symbol=:variable, value_name::Symbol=:value)
Arguments
dt
: the AbstractDataTable to be stackedmeasure_vars
: the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; formelt
, defaults to all variables that are notid_vars
. If neithermeasure_vars
orid_vars
are given,measure_vars
defaults to all floating point columns.id_vars
: the identifier columns that are repeated during stacking, a normal column indexing type; forstack
defaults to all variables that are notmeasure_vars
variable_name
: the name of the new stacked column that shall hold the names of each ofmeasure_vars
value_name
: the name of the new stacked column containing the values from each ofmeasure_vars
Result
::DataTable
: the long-format datatable with column:value
holding the values of the stacked columns (measure_vars
), with column:variable
a Vector of Symbols with themeasure_vars
name, and with columns for each of theid_vars
.
See also stackdt
and meltdt
for stacking methods that return a view into the original DataTable. See unstack
for converting from long to wide format.
Examples
d1 = DataTable(a = repeat([1:3;], inner = [4]),
b = repeat([1:4;], inner = [3]),
c = randn(12),
d = randn(12),
e = map(string, 'a':'l'))
d1s = stack(d1, [:c, :d])
d1s2 = stack(d1, [:c, :d], [:a])
d1m = melt(d1, [:a, :b, :e])
d1s_name = melt(d1, [:a, :b, :e], variable_name=:somemeasure)
DataTables.unstack
— Function.Unstacks a DataTable; convert from a long to wide format
unstack(dt::AbstractDataTable, rowkey, colkey, value)
unstack(dt::AbstractDataTable, colkey, value)
unstack(dt::AbstractDataTable)
Arguments
dt
: the AbstractDataTable to be unstackedrowkey
: the column with a unique key for each row, if not given, find a key by grouping on anything not acolkey
orvalue
colkey
: the column holding the column names in wide format, defaults to:variable
value
: the value column, defaults to:value
Result
::DataTable
: the wide-format datatable
Examples
wide = DataTable(id = 1:12,
a = repeat([1:3;], inner = [4]),
b = repeat([1:4;], inner = [3]),
c = randn(12),
d = randn(12))
long = stack(wide)
wide0 = unstack(long)
wide1 = unstack(long, :variable, :value)
wide2 = unstack(long, :id, :variable, :value)
Note that there are some differences between the widened results above.
DataTables.stackdt
— Function.A stacked view of a DataTable (long format)
Like stack
and melt
, but a view is returned rather than data copies.
stackdt(dt::AbstractDataTable, [measure_vars], [id_vars];
variable_name::Symbol=:variable, value_name::Symbol=:value)
meltdt(dt::AbstractDataTable, [id_vars], [measure_vars];
variable_name::Symbol=:variable, value_name::Symbol=:value)
Arguments
dt
: the wide AbstractDataTablemeasure_vars
: the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; formelt
, defaults to all variables that are notid_vars
id_vars
: the identifier columns that are repeated during stacking, a normal column indexing type; forstack
defaults to all variables that are notmeasure_vars
Result
::DataTable
: the long-format datatable with column:value
holding the values of the stacked columns (measure_vars
), with column:variable
a Vector of Symbols with themeasure_vars
name, and with columns for each of theid_vars
.
The result is a view because the columns are special AbstractVectors that return indexed views into the original DataTable.
Examples
d1 = DataTable(a = repeat([1:3;], inner = [4]),
b = repeat([1:4;], inner = [3]),
c = randn(12),
d = randn(12),
e = map(string, 'a':'l'))
d1s = stackdt(d1, [:c, :d])
d1s2 = stackdt(d1, [:c, :d], [:a])
d1m = meltdt(d1, [:a, :b, :e])
DataTables.meltdt
— Function.A stacked view of a DataTable (long format); see stackdt