Utilities

DataTables.eltypesFunction.

Return element types of columns

eltypes(dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

Result

  • ::Vector{Type} : the element type of each column

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
eltypes(dt)
source
DataTables.headFunction.

Show the first or last part of an AbstractDataTable

head(dt::AbstractDataTable, r::Int = 6)
tail(dt::AbstractDataTable, r::Int = 6)

Arguments

  • dt : the AbstractDataTable

  • r : the number of rows to show

Result

  • ::AbstractDataTable : the first or last part of dt

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
head(dt)
tail(dt)
source

Indexes of complete cases (rows without null values)

completecases(dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

Result

  • ::Vector{Bool} : indexes of complete cases

See also dropnull and dropnull!.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt[[1,4,5], :x] = Nullable()
dt[[9,10], :y] = Nullable()
completecases(dt)
source
StatsBase.describeFunction.

Summarize the columns of an AbstractDataTable

describe(dt::AbstractDataTable)
describe(io, dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

  • io : optional output descriptor

Result

  • nothing

Details

If the column's base type derives from Number, compute the minimum, first quantile, median, mean, third quantile, and maximum. Nulls are filtered and reported separately.

For boolean columns, report trues, falses, and nulls.

For other types, show column characteristics and number of nulls.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
describe(dt)
source
describe(a)

Pretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.

source

Remove rows with null values.

dropnull(dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

Result

  • ::AbstractDataTable : the updated copy

See also completecases and dropnull!.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt[[1,4,5], :x] = Nullable()
dt[[9,10], :y] = Nullable()
dropnull(dt)
source
dropnull(X::AbstractVector)

Return a vector containing only the non-null entries of X, unwrapping Nullable entries. A copy is always returned, even when X does not contain any null values.

source

Remove rows with null values in-place.

dropnull!(dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

Result

  • ::AbstractDataTable : the updated version

See also dropnull and completecases.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt[[1,4,5], :x] = Nullable()
dt[[9,10], :y] = Nullable()
dropnull!(dt)
source
dropnull!(X::AbstractVector)

Remove null entries of X in-place and return a Vector view of the unwrapped Nullable entries. If no nulls are present, this is a no-op and X is returned.

source
dropnull!(X::NullableVector)

Remove null entries of X in-place and return a Vector view of the unwrapped Nullable entries.

source
Base.dumpFunction.

Show the structure of an AbstractDataTable, in a tree-like format

dump(dt::AbstractDataTable, n::Int = 5)
dump(io::IO, dt::AbstractDataTable, n::Int = 5)

Arguments

  • dt : the AbstractDataTable

  • n : the number of levels to show

  • io : optional output descriptor

Result

  • nothing

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dump(dt)
source
DataTables.names!Function.

Set column names

names!(dt::AbstractDataTable, vals)

Arguments

  • dt : the AbstractDataTable

  • vals : column names, normally a Vector{Symbol} the same length as the number of columns in dt

  • allow_duplicates : if false (the default), an error will be raised if duplicate names are found; if true, duplicate names will be suffixed with _i (i starting at 1 for the first duplicate).

Result

  • ::AbstractDataTable : the updated result

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
names!(dt, [:a, :b, :c])
names!(dt, [:a, :b, :a])  # throws ArgumentError
names!(dt, [:a, :b, :a], allow_duplicates=true)  # renames second :a to :a_1
source
DataTables.nonuniqueFunction.

Indexes of duplicate rows (a row that is a duplicate of a prior row)

nonunique(dt::AbstractDataTable)
nonunique(dt::AbstractDataTable, cols)

Arguments

  • dt : the AbstractDataTable

  • cols : a column indicator (Symbol, Int, Vector{Symbol}, etc.) specifying the column(s) to compare

Result

  • ::Vector{Bool} : indicates whether the row is a duplicate of some prior row

See also unique and unique!.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt = vcat(dt, dt)
nonunique(dt)
nonunique(dt, 1)
source
DataTables.renameFunction.

Rename columns

rename!(dt::AbstractDataTable, from::Symbol, to::Symbol)
rename!(dt::AbstractDataTable, d::Associative)
rename!(f::Function, dt::AbstractDataTable)
rename(dt::AbstractDataTable, from::Symbol, to::Symbol)
rename(f::Function, dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

  • d : an Associative type that maps the original name to a new name

  • f : a function that has the old column name (a symbol) as input and new column name (a symbol) as output

Result

  • ::AbstractDataTable : the updated result

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
rename(x -> @compat(Symbol)(uppercase(string(x))), dt)
rename(dt, @compat(Dict(:i=>:A, :x=>:X)))
rename(dt, :y, :Y)
rename!(dt, @compat(Dict(:i=>:A, :x=>:X)))
source
DataTables.rename!Function.

Rename columns

rename!(dt::AbstractDataTable, from::Symbol, to::Symbol)
rename!(dt::AbstractDataTable, d::Associative)
rename!(f::Function, dt::AbstractDataTable)
rename(dt::AbstractDataTable, from::Symbol, to::Symbol)
rename(f::Function, dt::AbstractDataTable)

Arguments

  • dt : the AbstractDataTable

  • d : an Associative type that maps the original name to a new name

  • f : a function that has the old column name (a symbol) as input and new column name (a symbol) as output

Result

  • ::AbstractDataTable : the updated result

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
rename(x -> @compat(Symbol)(uppercase(string(x))), dt)
rename(dt, @compat(Dict(:i=>:A, :x=>:X)))
rename(dt, :y, :Y)
rename!(dt, @compat(Dict(:i=>:A, :x=>:X)))
source
DataTables.tailFunction.

Show the first or last part of an AbstractDataTable

head(dt::AbstractDataTable, r::Int = 6)
tail(dt::AbstractDataTable, r::Int = 6)

Arguments

  • dt : the AbstractDataTable

  • r : the number of rows to show

Result

  • ::AbstractDataTable : the first or last part of dt

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
head(dt)
tail(dt)
source
Base.uniqueFunction.

Delete duplicate rows

unique(dt::AbstractDataTable)
unique(dt::AbstractDataTable, cols)
unique!(dt::AbstractDataTable)
unique!(dt::AbstractDataTable, cols)

Arguments

  • dt : the AbstractDataTable

  • cols : column indicator (Symbol, Int, Vector{Symbol}, etc.)

specifying the column(s) to compare.

Result

  • ::AbstractDataTable : the updated version of dt with unique rows.

When cols is specified, the return DataTable contains complete rows, retaining in each case the first instance for which dt[cols] is unique.

See also nonunique.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt = vcat(dt, dt)
unique(dt)   # doesn't modify dt
unique(dt, 1)
unique!(dt)  # modifies dt
source
unique(A::CategoricalArray)
unique(A::NullableCategoricalArray)

Return levels which appear in A, in the same order as levels (and not in their order of appearance). This function is significantly slower than levels since it needs to check whether levels are used or not.

source
DataTables.unique!Function.

Delete duplicate rows

unique(dt::AbstractDataTable)
unique(dt::AbstractDataTable, cols)
unique!(dt::AbstractDataTable)
unique!(dt::AbstractDataTable, cols)

Arguments

  • dt : the AbstractDataTable

  • cols : column indicator (Symbol, Int, Vector{Symbol}, etc.)

specifying the column(s) to compare.

Result

  • ::AbstractDataTable : the updated version of dt with unique rows.

When cols is specified, the return DataTable contains complete rows, retaining in each case the first instance for which dt[cols] is unique.

See also nonunique.

Examples

dt = DataTable(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
dt = vcat(dt, dt)
unique(dt)   # doesn't modify dt
unique(dt, 1)
unique!(dt)  # modifies dt
source