How to Calculate Frequency in Julia
Overview 1
Use the freqtable()
function from the FreqTables.jl
package. It provides a similar functionality to the freq()
function in R.
Code
Arrays
julia> compartment = rand(['S','I','R'], 1000);
julia> freqtable(compartment)
3-element Named Vector{Int64}
Dim1 │
──────┼────
'I' │ 316
'R' │ 342
'S' │ 342
By inserting an array like shown above, it will count the frequency for each class.
DataFrames
freqtable()
is particularly useful for dataframes. Let’s load the built-in data ToothGrowth
, just like in the example of regression analysis with qualitative variables in R.
julia> ToothGrowth = dataset("datasets", "ToothGrowth")
60×3 DataFrame
Row │ Len Supp Dose
│ Float64 Cat… Float64
─────┼────────────────────────
1 │ 4.2 VC 0.5
2 │ 11.5 VC 0.5
3 │ 7.3 VC 0.5
4 │ 5.8 VC 0.5
⋮ │ ⋮ ⋮ ⋮
58 │ 27.3 OJ 2.0
59 │ 29.4 OJ 2.0
60 │ 23.0 OJ 2.0
53 rows omitted
julia> freqtable(ToothGrowth, :Len)
43-element Named Vector{Int64}
Len │
─────┼──
4.2 │ 1
5.2 │ 1
5.8 │ 1
6.4 │ 1
⋮ ⋮
29.5 │ 1
30.9 │ 1
32.5 │ 1
33.9 │ 1
julia> freqtable(ToothGrowth, :Supp)
2-element Named Vector{Int64}
Supp │
──────┼───
"OJ" │ 30
"VC" │ 30
julia> freqtable(ToothGrowth, :Dose)
3-element Named Vector{Int64}
Dose │
──────┼───
0.5 │ 20
1.0 │ 20
2.0 │ 20
ToothGrowth
contains data on the length of teeth (:Len
) in guinea pigs fed different amounts of vitamin C or orange juice (:Supp
). Calculating the frequencies for each column as shown tidies up the data nicely. It demonstrates that the data doesn’t necessarily have to be categorical.
julia> freqtable(ToothGrowth, :Supp, :Dose)
2×3 Named Matrix{Int64}
Supp ╲ Dose │ 0.5 1.0 2.0
────────────┼──────────────
"OJ" │ 10 10 10
"VC" │ 10 10 10
Of course, this type of table is most effective for categorical data. Calculating the frequencies for :Supp
, and :Dose
automatically divides it into 2D categories for us.
julia> freqtable(ToothGrowth, :Len, :Dose, :Supp)
43×3×2 Named Array{Int64, 3}
[:, :, Supp="OJ"] =
Len ╲ Dose │ 0.5 1.0 2.0
───────────┼──────────────
4.2 │ 0 0 0
⋮ ⋮ ⋮ ⋮
33.9 │ 0 0 0
[:, :, Supp="VC"] =
Len ╲ Dose │ 0.5 1.0 2.0
───────────┼──────────────
4.2 │ 1 0 0
⋮ ⋮ ⋮ ⋮
33.9 │ 0 0 1
Calculating across more than 3 columns simply returns the class counts in a 2D table. At this point, it almost loses any meaning in terms of exploring or summarizing data.
Performance Comparison
julia> @time for t in 1:10^4
freqtable(compartment)
end
@time for t in 1:10^4
count(compartment .== 'S')
count(compartment .== 'I')
count(compartment .== 'R')
end
0.068229 seconds (340.00 k allocations: 27.466 MiB)
0.059198 seconds (180.00 k allocations: 134.125 MiB, 36.71% gc time)
Leaving the table aside, the notion of counting frequencies itself seems useful. Several tests were conducted measuring the speed of manual counting versus using freqtable()
to compute the frequencies all at once. Neither was consistently faster, fluctuating depending on the amount of data or the number of classes. Overall, freqtable()
tended to be slower, but not by a huge margin. So, regardless of speed, it’s worth using thoughtfully when needed.
Full Code
using FreqTables
compartment = rand(['S','I','R'], 1000);
freqtable(compartment)
using RDatasets
ToothGrowth = dataset("datasets", "ToothGrowth")
freqtable(ToothGrowth, :Len)
freqtable(ToothGrowth, :Supp)
freqtable(ToothGrowth, :Dose)
freqtable(ToothGrowth, :Supp, :Dose)
freqtable(ToothGrowth, :Len, :Dose, :Supp)
@time for t in 1:10^4
freqtable(compartment)
end
@time for t in 1:10^4
count(compartment .== 'S')
count(compartment .== 'I')
count(compartment .== 'R')
end
Environment
- OS: Windows
- julia: v1.6.3
- FreqTables v0.4.5