logo

How to Calculate Distance using NearstNeibors.jl in julia 📂Julia

How to Calculate Distance using NearstNeibors.jl in julia

Overview

To calculate the distance between $n$ points, if there’s no need to create a matrix but just to compute the distance, using a k-d tree1, a data structure advantageous for multi-dimensional search, can enhance speed. All related algorithms are implemented in NearestNeighbors.jl, so refer to the official GitHub page.

Speed Comparison

Let’s compare with the technique optimized for calculating distance matrices using the pairwise() function.

using Distances
using StatsBase
using Random
using NearestNeighbors

Random.seed!(0);
ε = 0.01
N = 10^4
coordinate = rand(2, N);
state = sample(['S', 'I'], Weights([0.1, 0.9]), N);
S = coordinate[:, state .== 'S']
I = coordinate[:, state .== 'I']

@time sum(pairwise(Euclidean(),S,I) .< ε, dims = 1)
@time kdtree = KDTree(S); contact = length.(inrange(kdtree, I, ε, true))

The result of running the above code is as follows. The last two command lines perform the same task, but the speed difference is about 500 times. In fact, the time complexity when searching in a k-d tree is $\log n$, which is very efficient.

julia> @time sum(pairwise(Euclidean(),S,I) .< ε, dims = 1)
  0.098394 seconds (14 allocations: 69.639 MiB, 8.23% gc Time)
1×9004 Array{Int64,2}:
 0  0  1  0  0  0  0  …  1  0  0  0  0  0

julia> @time kdtree = KDTree(S); contact = length.(inrange(kdtree, I, ε, true))
  0.000213 seconds (22 allocations: 51.609 KiB)
9004-element Array{Int64,1}:
 0
 0
 1
 0
 ⋮
 0
 0
 0

Even setting aside the simple issue of speed, the approach using the k-d tree is more convenient from the perspective of using the results, as it returns a 1-dimensional array.

Environment

  • OS: Windows
  • julia: v1.6.2