Indexing

Indexing

General rules

The following rules explain target functionality of how getindex, setindex!, view, and broadcasting are intended to work with DataFrame, SubDataFrame and DataFrameRow objects.

The rules for a valid type of index into a column are the following:

The rules for a valid type of index into a row are the following:

Additionally it is allowed to index into an AbstractDataFrame using a two-dimensional CartesianIndex.

In the descriptions below df represents a DataFrame, sdf is a SubDataFrame and dfr is a DataFrameRow.

: always expands to axes(df, 1) or axes(sdf, 1).

df.col works like df[!, col] and sdf.col works like sdf[!, col] in all cases except that df.col .= v and sdf.col .= v perform in-place broadcasting if col is present in df/sdf and is a valid identifier.

getindex and view

The following list specifies the behavior of getindex and view operations depending on argument types.

In particular a description explicitly mentions that the data is copied or reused without copying.

For performance reasons, accessing, via getindex or view, a single row and multiple cols of a DataFrame, a SubDataFrame or a DataFrameRow always returns a DataFrameRow (which is a view type).

getindex on DataFrame:

view on DataFrame:

getindex on SubDataFrame:

view on SubDataFrame:

getindex on DataFrameRow:

view on DataFrameRow:

Note that views created with columns selector set to : change their columns' count if columns are added/removed/renamed in the parent; if column selector is other than : then view points to selected columns by their number at the moment of creation of the view.

setindex!

The following list specifies the behavior of setindex! operations depending on argument types.

In particular a description explicitly mentions if the assignment is in-place.

Note that if a setindex! operation throws an error the target data frame may be partially changed so it is unsafe to use it afterwards (the column length correctness will be preserved).

setindex! on DataFrame:

setindex! on SubDataFrame:

Note that sdf[!, col] = v, sdf[!, cols] = v and sdf.col = v are not allowed as sdf can be only modified in-place.

setindex! on DataFrameRow:

Broadcasting

The following broadcasting rules apply to AbstractDataFrame objects:

Note that if broadcasting assignment operation throws an error the target data frame may be partially changed so it is unsafe to use it afterwards (the column length correctness will be preserved).

Broadcasting DataFrameRow is currently not allowed (which is consistent with NamedTuple).

It is possible to assign a value to AbstractDataFrame and DataFrameRow objects using the .= operator. In such an operation AbstractDataFrame is considered as two-dimensional and DataFrameRow as single-dimensional.

Note

The rule above means that, similar to single-dimensional objects in Base (e.g. vectors), DataFrameRow is considered to be column-oriented.

Additional rules:

Note that sdf[!, col] .= v and sdf[!, cols] .= v syntaxes are not allowed as sdf can be only modified in-place.

If column indexing using Symbol or AbstractString names in cols is performed, the order of columns in the operation is specified by the order of names.

Indexing GroupedDataFrames

A GroupedDataFrame can behave as either an AbstractVector or AbstractDict depending on the type of index used. Integers (or arrays of them) trigger vector-like indexing while Tupless and NamedTuples trigger dictionary-like indexing. An intermediate between the two is the GroupKey type returned by keys(::GroupedDataFrame), which behaves similarly to a NamedTuple but has performance on par with integer indexing.

The elements of a GroupedDataFrame are SubDataFrames of its parent.

Common API for types defined in DataFrames.jl

This table presents return value types of calling names, propertynames and keys on types exposed to the user by DataFrames.jl:

Typenamespropertynameskeys
AbstractDataFrameVector{String}Vector{Symbol}undefined
DataFrameRowVector{String}Vector{Symbol}Vector{Symbol}
DataFrameRowsVector{String}Vector{Symbol}vector of Int
DataFrameColumnsVector{String}Vector{Symbol}Vector{Symbol}
GroupedDataFrameVector{String}tuple of fieldsGroupKeys
GroupKeysundefinedtuple of fieldsvector of Int
GroupKeyVector{String}Vector{Symbol}Vector{Symbol}