Functions

Multithreading support

By default, selected operations in DataFrames.jl automatically use multiple threads when available. It is task-based and implemented using the @spawn macro from Julia Base. Functions that take user-defined functions and may run it in parallel accept a threads keyword argument which allows disabling multithreading when the provided function requires serial execution or is not thread-safe.

This is a list of operations that currently make use of multi-threading:

DataFrame constructor with copycols=true; also recursively all functions that call this constructor, e.g. copy.
getindex when multiple columns are selected.
groupby (both when hashing is required and when fast path using DataAPI.refpool is used).
*join functions for composing output data frame (but currently not for finding matching rows in joined data frames).
combine, select[!], and transform[!] on GroupedDataFrame when either of the conditions below is met:
- multiple transformations are performed (each transformation is spawned in a separate task)
- a transformation produces one row per group and the passed transformation is a custom function (i.e. not for standard reductions, which use optimized single-threaded methods).
dropmissing when the provided data frame has more than 1 column and view=false (subsetting of individual columns is spawned in separate tasks).

In general at least Julia 1.4 is required to ensure that multi-threading is used and the Julia process must be started with more than one thread. Some operations turn on multi-threading only if enough rows in the processed data frame are present (the exact threshold when multi-threading is enabled is considered to be undefined and might change in the future).

Except for the list above, where multi-threading is used automatically, all functions provided by DataFrames.jl that update a data frame are not thread safe. This means that while they can be called from any thread, the caller is responsible for ensuring that a given DataFrame object is never modified by one thread while others are using it (either for reading or writing). Using the same DataFrame at the same time from different threads is safe as long as it is not modified.

Index

Base.Iterators.only
Base.Iterators.partition
Base.allunique
Base.append!
Base.copy
Base.deleteat!
Base.eachcol
Base.eachrow
Base.empty
Base.empty!
Base.filter
Base.filter!
Base.first
Base.get
Base.hcat
Base.insert!
Base.invpermute!
Base.isapprox
Base.isempty
Base.issorted
Base.keepat!
Base.keys
Base.last
Base.length
Base.names
Base.ndims
Base.pairs
Base.parent
Base.permute!
Base.permutedims
Base.pop!
Base.popat!
Base.popfirst!
Base.prepend!
Base.propertynames
Base.push!
Base.pushfirst!
Base.reduce
Base.repeat
Base.resize!
Base.reverse
Base.reverse!
Base.show
Base.similar
Base.size
Base.sort
Base.sort!
Base.sortperm
Base.stack
Base.unique
Base.unique!
Base.values
Base.vcat
DataAPI.allcombinations
DataAPI.antijoin
DataAPI.colmetadata
DataAPI.colmetadata!
DataAPI.colmetadatakeys
DataAPI.crossjoin
DataAPI.deletecolmetadata!
DataAPI.deletemetadata!
DataAPI.describe
DataAPI.emptycolmetadata!
DataAPI.emptymetadata!
DataAPI.innerjoin
DataAPI.leftjoin
DataAPI.metadata
DataAPI.metadata!
DataAPI.metadatakeys
DataAPI.ncol
DataAPI.nrow
DataAPI.outerjoin
DataAPI.rightjoin
DataAPI.rownumber
DataAPI.semijoin
DataFrames.allowmissing!
DataFrames.combine
DataFrames.completecases
DataFrames.disallowmissing!
DataFrames.dropmissing
DataFrames.dropmissing!
DataFrames.fillcombinations
DataFrames.flatten
DataFrames.groupby
DataFrames.groupcols
DataFrames.groupindices
DataFrames.insertcols
DataFrames.insertcols!
DataFrames.leftjoin!
DataFrames.mapcols
DataFrames.mapcols!
DataFrames.nonunique
DataFrames.order
DataFrames.proprow
DataFrames.rename
DataFrames.rename!
DataFrames.repeat!
DataFrames.select
DataFrames.select!
DataFrames.subset
DataFrames.subset!
DataFrames.table_transformation
DataFrames.transform
DataFrames.transform!
DataFrames.unstack
DataFrames.valuecols
Missings.allowmissing
Missings.disallowmissing
Random.shuffle
Random.shuffle!

Constructing data frames

DataAPI.allcombinations — Function

allcombinations(DataFrame, pairs::Pair...)
allcombinations(DataFrame; kwargs...)

Create a DataFrame from all combinations of values in passed arguments. The first passed values vary fastest.

Arguments associating a column name with values to expand can be specified either as Pairs passed as positional arguments, or as keyword arguments. Column names must be Symbols or strings and must be unique.

Column value can be a vector which is consumed as is or an object of any other type (except AbstractArray). In the latter case the passed value is treated as having length one for expansion. As a particular rule values stored in a Ref or a 0-dimensional AbstractArray are unwrapped and treated as having length one.

See also: crossjoin can be used to get the cartesian product of rows from passed data frames.

Examples

julia> allcombinations(DataFrame, a=1:2, b='a':'c')
6×2 DataFrame
 Row │ a      b
     │ Int64  Char
─────┼─────────────
   1 │     1  a
   2 │     2  a
   3 │     1  b
   4 │     2  b
   5 │     1  c
   6 │     2  c

julia> allcombinations(DataFrame, "a" => 1:2, "b" => 'a':'c', "c" => "const")
6×3 DataFrame
 Row │ a      b     c
     │ Int64  Char  String
─────┼─────────────────────
   1 │     1  a     const
   2 │     2  a     const
   3 │     1  b     const
   4 │     2  b     const
   5 │     1  c     const
   6 │     2  c     const