logo

Programming with Map and Reduce 📂Programing

Programming with Map and Reduce

Definitions

Map

Let us denote a set of functions as $f : X \to Y$. The higher-order function defined as follows, $\text{map}$, is called a map. $$ \text{map} : F \times 2^{X} \to 2^{Y} \\ \text{map} \left( f , \left\{ x_{1} , \cdots , x_{n} \right\} \right) := \left\{ f \left( x_{1} \right) , \cdots, f \left( x_{n} \right) \right\} $$

Fold

Let us denote a set of functions as $g : X \to X$. The higher-order function defined as follows, $\text{fold}$, is called a fold.

$$ \text{fold} : G \times 2^{X} \to X \\ \text{fold} \left( g , A \right) := g \left( x , \text{fold} \left( g , A \setminus \left\{ x \right\} \right) \right) \qquad , x \in A $$

Explanation

The formulas mentioned in the definitions are there just to make it easier for those who do mathematics to understand, so don’t be too alarmed. Simply put, the map is a function that applies the given function to each element of the array it receives, and the fold is a function that keeps applying the given function by popping each element from the array.

If the formula for fold seems complicated, refer to Wikipedia’s explanation. It’s tough enough to make the formula seem simple in comparison.

In functional programming, a fold is one of a family of higher-order functions that analyze a recursive data structure and recombine it using the provided combining operations, performing the combining recursively and returning a return value. Folds are often expressed as functions applied in a combinatorial fashion to the top node of data structures, and they may hold some default values that can be used under certain conditions. They systematically use that function to combine hierarchical elements of the data structure.

Both map and fold are higher-order functions that take a function as an argument, and notably, fold is a recursive function. As the title of this post suggests, fold is commonly referred to as Reduce.

Hadoop.png

Embracing functional programming, map and reduce are considered some of the benefits gained, seen in areas like Hadoop or distributed computing. Though less efficient, imagine processing data on each of many slave nodes (Map) and then summarizing that data by gathering it on a master node (Reduce).

Examples

Map: Broadcasting

In the programming language Julia, map is used routinely, and the syntax that natively allows the use of map is called Broadcasting.

julia> A = [1,-2,3,-4]
4-element Vector{Int64}:
  1
 -2
  3
 -4

julia> f(x) = x^2
f (generic function with 1 method)

For instance, with an array A and a function f(), map() applies f to each element of the array as follows:

julia> map(f, A)
4-element Vector{Int64}:
  1
  4
  9
 16

Meanwhile, using the same principle for the same result, you can also indicate that you wish to apply f pointwise to the elements of A by putting a period . after f. This is broadcasting.

julia> f.(A)
4-element Vector{Int64}:
  1
  4
  9
 16

Fold: Reduce

No further explanation is needed. It’s all clear upon seeing it directly.

julia> reduce(+, A)
-2

$$ \text{reduce} (+,A) = -4 + (3 + ((-2) + 1)) = -2 $$

julia> reduce(max, A)
3

$$ \text{reduce} (\max,A) = \max(-4 , \max(3 , \max (-2 , 1))) = 3 $$