Indexing

Indexing

General rules

The following rules explain target functionality of how getindex, setindex!, view, and broadcasting are intended to work with DataFrame, SubDataFrame and DataFrameRow objects.

The rules for a valid type of index into a column are the following:

The rules for a valid type of index into a row are the following:

Additionally it is allowed to index into an AbstractDataFrame using a two-dimensional CartesianIndex.

In the descriptions below df represents a DataFrame, sdf is a SubDataFrame and dfr is a DataFrameRow.

: always expands to axes(df, 1) or axes(sdf, 1).

df.col works like df[!, col] and sdf.col works like sdf[!, col] in all cases except that df.col .= v and sdf.col .= v perform in-place broadcasting if col is present in df/sdf and is a valid identifier.

getindex and view

The following list specifies the behavior of getindex and view operations depending on argument types.

In particular a description explicitly mentions that the data is copied or reused without copying.

For performance reasons, accessing, via getindex or view, a single row and multiple cols of a DataFrame, a SubDataFrame or a DataFrameRow always returns a DataFrameRow (which is a view type).

getindex on DataFrame:

view on DataFrame:

getindex on SubDataFrame:

view on SubDataFrame:

getindex on DataFrameRow:

view on DataFrameRow:

Note that views created with columns selector set to : change their columns' count if columns are added/removed/renamed in the parent; if column selector is other than : then view points to selected columns by their number at the moment of creation of the view.

setindex!

The following list specifies the target behavior of setindex! operations depending on argument types.

In the current release of DataFrames.jl we are in the transition period when an old, undocumented, behavior of setindex! is still supported, but throws deprecation warnings.

The behavior described below will be fully implemented in the next major release of DataFrames.jl.

In particular a description explicitly mentions if the assignment is in-place.

setindex! on DataFrame:

Note that only df[!, col] = v and df.col = v can be used to add a new column to a DataFrame. In particular as df[:, col] = v is an in-place operation it does not add a column v to a DataFrame if col is missing (an error is thrown if such operation is attempted).

setindex! on SubDataFrame:

Note that sdf[!, col] = v, sdf[!, cols] = v and sdf.col = v are not allowed as sdf can be only modified in-place.

setindex! on DataFrameRow:

Broadcasting

The following broadcasting rules apply to AbstractDataFrame objects:

Broadcasting DataFrameRow is currently not allowed (which is consistent with NamedTuple).

It is possible to assign a value to AbstractDataFrame and DataFrameRow objects using the .= operator. In such an operation AbstractDataFrame is considered as two-dimensional and DataFrameRow as single-dimensional.

Note

The rule above means that, similar to single-dimensional objects in Base (e.g. vectors), DataFrameRow is considered to be column-oriented.

Additional rules:

Note that sdf[!, col] .= v and sdf[!, cols] .= v syntaxes are not allowed as sdf can be only modified in-place.

If column indexing using Symbol names in cols is performed, the order of columns in the operation is specified by the order of names.