Sorting

Sorting is a fundamental component of data analysis. Basic sorting is trivial: just calling sort! will sort all columns, in place:

julia> using DataFrames, CSV

julia> path = joinpath(pkgdir(DataFrames), "docs", "src", "assets", "iris.csv");

julia> iris = CSV.read(path, DataFrame)
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2  Iris-setosa
   2 │         4.9         3.0          1.4         0.2  Iris-setosa
   3 │         4.7         3.2          1.3         0.2  Iris-setosa
   4 │         4.6         3.1          1.5         0.2  Iris-setosa
   5 │         5.0         3.6          1.4         0.2  Iris-setosa
   6 │         5.4         3.9          1.7         0.4  Iris-setosa
   7 │         4.6         3.4          1.4         0.3  Iris-setosa
   8 │         5.0         3.4          1.5         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         6.8         3.2          5.9         2.3  Iris-virginica
 145 │         6.7         3.3          5.7         2.5  Iris-virginica
 146 │         6.7         3.0          5.2         2.3  Iris-virginica
 147 │         6.3         2.5          5.0         1.9  Iris-virginica
 148 │         6.5         3.0          5.2         2.0  Iris-virginica
 149 │         6.2         3.4          5.4         2.3  Iris-virginica
 150 │         5.9         3.0          5.1         1.8  Iris-virginica
                                                        135 rows omitted

julia> sort!(iris)
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         4.3         3.0          1.1         0.1  Iris-setosa
   2 │         4.4         2.9          1.4         0.2  Iris-setosa
   3 │         4.4         3.0          1.3         0.2  Iris-setosa
   4 │         4.4         3.2          1.3         0.2  Iris-setosa
   5 │         4.5         2.3          1.3         0.3  Iris-setosa
   6 │         4.6         3.1          1.5         0.2  Iris-setosa
   7 │         4.6         3.2          1.4         0.2  Iris-setosa
   8 │         4.6         3.4          1.4         0.3  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         7.4         2.8          6.1         1.9  Iris-virginica
 145 │         7.6         3.0          6.6         2.1  Iris-virginica
 146 │         7.7         2.6          6.9         2.3  Iris-virginica
 147 │         7.7         2.8          6.7         2.0  Iris-virginica
 148 │         7.7         3.0          6.1         2.3  Iris-virginica
 149 │         7.7         3.8          6.7         2.2  Iris-virginica
 150 │         7.9         3.8          6.4         2.0  Iris-virginica
                                                        135 rows omitted

Observe that all columns are taken into account lexicographically when sorting the DataFrame.

You can also call the sort function to create a new DataFrame with freshly allocated sorted vectors.

In sorting DataFrames, you may want to sort different columns with different options. Here are some examples showing most of the possible options:

julia> sort!(iris, rev = true)
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         7.9         3.8          6.4         2.0  Iris-virginica
   2 │         7.7         3.8          6.7         2.2  Iris-virginica
   3 │         7.7         3.0          6.1         2.3  Iris-virginica
   4 │         7.7         2.8          6.7         2.0  Iris-virginica
   5 │         7.7         2.6          6.9         2.3  Iris-virginica
   6 │         7.6         3.0          6.6         2.1  Iris-virginica
   7 │         7.4         2.8          6.1         1.9  Iris-virginica
   8 │         7.3         2.9          6.3         1.8  Iris-virginica
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         4.6         3.2          1.4         0.2  Iris-setosa
 145 │         4.6         3.1          1.5         0.2  Iris-setosa
 146 │         4.5         2.3          1.3         0.3  Iris-setosa
 147 │         4.4         3.2          1.3         0.2  Iris-setosa
 148 │         4.4         3.0          1.3         0.2  Iris-setosa
 149 │         4.4         2.9          1.4         0.2  Iris-setosa
 150 │         4.3         3.0          1.1         0.1  Iris-setosa
                                                        135 rows omitted

julia> sort!(iris, [:Species, :SepalWidth])
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         4.5         2.3          1.3         0.3  Iris-setosa
   2 │         4.4         2.9          1.4         0.2  Iris-setosa
   3 │         5.0         3.0          1.6         0.2  Iris-setosa
   4 │         4.9         3.0          1.4         0.2  Iris-setosa
   5 │         4.8         3.0          1.4         0.3  Iris-setosa
   6 │         4.8         3.0          1.4         0.1  Iris-setosa
   7 │         4.4         3.0          1.3         0.2  Iris-setosa
   8 │         4.3         3.0          1.1         0.1  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         6.7         3.3          5.7         2.1  Iris-virginica
 145 │         6.3         3.3          6.0         2.5  Iris-virginica
 146 │         6.3         3.4          5.6         2.4  Iris-virginica
 147 │         6.2         3.4          5.4         2.3  Iris-virginica
 148 │         7.2         3.6          6.1         2.5  Iris-virginica
 149 │         7.9         3.8          6.4         2.0  Iris-virginica
 150 │         7.7         3.8          6.7         2.2  Iris-virginica
                                                        135 rows omitted

julia> sort!(iris, [order(:Species, by=length), order(:SepalLength, rev=true)])
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼───────────────────────────────────────────────────────────────────
   1 │         5.8         4.0          1.2         0.2  Iris-setosa
   2 │         5.7         3.8          1.7         0.3  Iris-setosa
   3 │         5.7         4.4          1.5         0.4  Iris-setosa
   4 │         5.5         3.5          1.3         0.2  Iris-setosa
   5 │         5.5         4.2          1.4         0.2  Iris-setosa
   6 │         5.4         3.4          1.7         0.2  Iris-setosa
   7 │         5.4         3.4          1.5         0.4  Iris-setosa
   8 │         5.4         3.7          1.5         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮              ⋮
 144 │         5.5         2.6          4.4         1.2  Iris-versicolor
 145 │         5.4         3.0          4.5         1.5  Iris-versicolor
 146 │         5.2         2.7          3.9         1.4  Iris-versicolor
 147 │         5.1         2.5          3.0         1.1  Iris-versicolor
 148 │         5.0         2.0          3.5         1.0  Iris-versicolor
 149 │         5.0         2.3          3.3         1.0  Iris-versicolor
 150 │         4.9         2.4          3.3         1.0  Iris-versicolor
                                                         135 rows omitted

Keywords used above include rev (to sort in reverse), and by (to apply a function to values before comparing them). Each keyword can either be a single value, a vector with values corresponding to individual columns, or a selector: :, Cols, All, Not, Between, or Regex.

As an alternative to using a vector values you can use order to specify an ordering for a particular column within a set of columns.

The following two examples show two ways to sort the iris dataset with the same result: :Species will be ordered in reverse order, and within groups, rows will be sorted by increasing :PetalLength:

julia> sort!(iris, [:Species, :PetalLength], rev=[true, false])
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         4.9         2.5          4.5         1.7  Iris-virginica
   2 │         6.2         2.8          4.8         1.8  Iris-virginica
   3 │         6.0         3.0          4.8         1.8  Iris-virginica
   4 │         6.3         2.7          4.9         1.8  Iris-virginica
   5 │         6.1         3.0          4.9         1.8  Iris-virginica
   6 │         5.6         2.8          4.9         2.0  Iris-virginica
   7 │         6.3         2.5          5.0         1.9  Iris-virginica
   8 │         6.0         2.2          5.0         1.5  Iris-virginica
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         4.7         3.2          1.6         0.2  Iris-setosa
 145 │         5.7         3.8          1.7         0.3  Iris-setosa
 146 │         5.4         3.4          1.7         0.2  Iris-setosa
 147 │         5.4         3.9          1.7         0.4  Iris-setosa
 148 │         5.1         3.3          1.7         0.5  Iris-setosa
 149 │         5.1         3.8          1.9         0.4  Iris-setosa
 150 │         4.8         3.4          1.9         0.2  Iris-setosa
                                                        135 rows omitted

julia> sort!(iris, [order(:Species, rev=true), :PetalLength])
150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species
     │ Float64      Float64     Float64      Float64     String15
─────┼──────────────────────────────────────────────────────────────────
   1 │         4.9         2.5          4.5         1.7  Iris-virginica
   2 │         6.2         2.8          4.8         1.8  Iris-virginica
   3 │         6.0         3.0          4.8         1.8  Iris-virginica
   4 │         6.3         2.7          4.9         1.8  Iris-virginica
   5 │         6.1         3.0          4.9         1.8  Iris-virginica
   6 │         5.6         2.8          4.9         2.0  Iris-virginica
   7 │         6.3         2.5          5.0         1.9  Iris-virginica
   8 │         6.0         2.2          5.0         1.5  Iris-virginica
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 144 │         4.7         3.2          1.6         0.2  Iris-setosa
 145 │         5.7         3.8          1.7         0.3  Iris-setosa
 146 │         5.4         3.4          1.7         0.2  Iris-setosa
 147 │         5.4         3.9          1.7         0.4  Iris-setosa
 148 │         5.1         3.3          1.7         0.5  Iris-setosa
 149 │         5.1         3.8          1.9         0.4  Iris-setosa
 150 │         4.8         3.4          1.9         0.2  Iris-setosa
                                                        135 rows omitted