logo

Julia's Categorical Array 📂Julia

Julia's Categorical Array

Overview

The CategoricalArrays.jl package in Julia serves a similar function to factor in R.

Code

julia> A = ["red", "blue", "red", "green"]
4-element Vector{String}:
 "red"
 "blue"
 "red"
 "green"

julia> B = categorical(A)
4-element CategoricalArray{String,1,UInt32}:
 "red"
 "blue"
 "red"
 "green"

julia> levels(B)
3-element Vector{String}:
 "blue"
 "green"
 "red"

categorical()

The categorical() function allows for casting a regular array to a categorical array.

levels()

With the levels() function, one can view the categories. Naturally, there are no duplicates in categories, and even if an element corresponding to a category is missing from the array, the category itself remains.

julia> B[2] = "red"; B
4-element CategoricalArray{String,1,UInt32}:
 "red"
 "red"
 "red"
 "green"

julia> levels(B)
3-element Vector{String}:
 "blue"
 "green"
 "red"

This characteristic of maintaining categories regardless of the array’s state is very useful in certain coding contexts. It’s particularly beneficial in data analysis tasks, where subsets of the dataset are frequently handled. Knowing the categorical array in such cases can be a great help.

Optimization

Technically, instead of using levels(), using unique() on a regular array could achieve a similar implementation.

julia> @time for t in 1:10^6
           unique(A)
       end
  0.543157 seconds (6.00 M allocations: 579.834 MiB, 17.33% gc time)

julia> @time for t in 1:10^6
           levels(B)
       end
  0.013324 seconds

However, the speed difference is about 40 times. Since the categories get updated every time the array changes, there’s no need to undergo any separate computation process, allowing for immediate referencing.

Full Code

using CategoricalArrays

A = ["red", "blue", "red", "green"]
B = categorical(A)
levels(B)

B[2] = "red"; B
levels(B)

@time for t in 1:10^6
    unique(A)
end

@time for t in 1:10^6
    levels(B)
end

Environment

  • OS: Windows
  • julia: v1.6.3
  • CategoricalArrays v0.10.2