logo

How to Calculate Frequency in Julia 📂Julia

How to Calculate Frequency in Julia

Overview 1

Use the freqtable() function from the FreqTables.jl package. It provides a similar functionality to the freq() function in R.

Code

Arrays

julia> compartment = rand(['S','I','R'], 1000);

julia> freqtable(compartment)
3-element Named Vector{Int64}
Dim1  │
──────┼────
'I'   │ 316
'R'   │ 342
'S'   │ 342

By inserting an array like shown above, it will count the frequency for each class.

DataFrames

freqtable() is particularly useful for dataframes. Let’s load the built-in data ToothGrowth, just like in the example of regression analysis with qualitative variables in R.

julia> ToothGrowth = dataset("datasets", "ToothGrowth")
60×3 DataFrame
 Row │ Len      Supp  Dose    
     │ Float64  Cat…  Float64
─────┼────────────────────────
   1 │     4.2  VC        0.5
   2 │    11.5  VC        0.5
   3 │     7.3  VC        0.5
   4 │     5.8  VC        0.5
  ⋮  │    ⋮      ⋮       ⋮
  58 │    27.3  OJ        2.0
  59 │    29.4  OJ        2.0
  60 │    23.0  OJ        2.0
               53 rows omitted

julia> freqtable(ToothGrowth, :Len)
43-element Named Vector{Int64}
Len  │
─────┼──
4.2  │ 1
5.2  │ 1
5.8  │ 1
6.4  │ 1
⋮      ⋮
29.5 │ 1
30.9 │ 1
32.5 │ 1
33.9 │ 1

julia> freqtable(ToothGrowth, :Supp)
2-element Named Vector{Int64}
Supp  │
──────┼───
"OJ"  │ 30
"VC"  │ 30

julia> freqtable(ToothGrowth, :Dose)
3-element Named Vector{Int64}
Dose  │
──────┼───
0.5   │ 20
1.0   │ 20
2.0   │ 20

ToothGrowth contains data on the length of teeth (:Len) in guinea pigs fed different amounts of vitamin C or orange juice (:Supp). Calculating the frequencies for each column as shown tidies up the data nicely. It demonstrates that the data doesn’t necessarily have to be categorical.

julia> freqtable(ToothGrowth, :Supp, :Dose)
2×3 Named Matrix{Int64}
Supp ╲ Dose │ 0.5  1.0  2.0
────────────┼──────────────
"OJ"        │  10   10   10
"VC"        │  10   10   10

Of course, this type of table is most effective for categorical data. Calculating the frequencies for :Supp, and :Dose automatically divides it into 2D categories for us.

julia> freqtable(ToothGrowth, :Len, :Dose, :Supp)
43×3×2 Named Array{Int64, 3}

[:, :, Supp="OJ"] =
Len ╲ Dose │ 0.5  1.0  2.0
───────────┼──────────────
4.2        │   0    0    0
⋮              ⋮    ⋮    ⋮
33.9       │   0    0    0

[:, :, Supp="VC"] =
Len ╲ Dose │ 0.5  1.0  2.0
───────────┼──────────────
4.2        │   1    0    0
⋮              ⋮    ⋮    ⋮
33.9       │   0    0    1

Calculating across more than 3 columns simply returns the class counts in a 2D table. At this point, it almost loses any meaning in terms of exploring or summarizing data.

Performance Comparison

julia> @time for t in 1:10^4
           freqtable(compartment)
       end
       @time for t in 1:10^4
           count(compartment .== 'S')
           count(compartment .== 'I')
           count(compartment .== 'R')
       end
  0.068229 seconds (340.00 k allocations: 27.466 MiB)
  0.059198 seconds (180.00 k allocations: 134.125 MiB, 36.71% gc time)

Leaving the table aside, the notion of counting frequencies itself seems useful. Several tests were conducted measuring the speed of manual counting versus using freqtable() to compute the frequencies all at once. Neither was consistently faster, fluctuating depending on the amount of data or the number of classes. Overall, freqtable() tended to be slower, but not by a huge margin. So, regardless of speed, it’s worth using thoughtfully when needed.

Full Code

using FreqTables

compartment = rand(['S','I','R'], 1000);
freqtable(compartment)

using RDatasets
ToothGrowth = dataset("datasets", "ToothGrowth")

freqtable(ToothGrowth, :Len)
freqtable(ToothGrowth, :Supp)
freqtable(ToothGrowth, :Dose)

freqtable(ToothGrowth, :Supp, :Dose)
freqtable(ToothGrowth, :Len, :Dose, :Supp)

@time for t in 1:10^4
    freqtable(compartment)
end
@time for t in 1:10^4
    count(compartment .== 'S')
    count(compartment .== 'I')
    count(compartment .== 'R')
end

Environment

  • OS: Windows
  • julia: v1.6.3
  • FreqTables v0.4.5