logo

What is a Discriminant Function? 📂Machine Learning

What is a Discriminant Function?

Definition

A function σ:RR\sigma : \mathbb{R} \to \mathbb{R} that satisfies Inσ(yTx+θ)dμ(x)=0    μ=0 \int_{I_{n}} \sigma \left( y^{T} x + \theta \right) d \mu (x) = 0 \implies \mu =0 for all yRny \in \mathbb{R}^{n} and θR\theta \in \mathbb{R} and some μM(In)\mu \in M \left( I_{n} \right) is called a Discriminatory Function.


  • In:=[0,1]nI_{n} := [0,1]^{n} is the nn-dimensional unit cube, which is the Cartesian product of nn unit closed intervals [0,1][0,1].
  • M(In)M \left( I_{n} \right) is the set of sign finite regular Borel measures defined on In:=[0,1]nI_{n} := [0,1]^{n}.
  • yTy^{T} is the transpose matrix of yy, where yTxy^{T} x is equal to the inner product <x,y>\left< x, y \right> of xx and yy.

Explanation of the Definition

To understand the concept of the discriminatory function through formulas, it may be good to start with the contrapositive. μ0    Inσ(yTx+θ)dμ(x)0 \mu \ne 0 \implies \int_{I_{n}} \sigma \left( y^{T} x + \theta \right) d \mu (x) \ne 0 Simply put, unless the measure μ\mu is brutally sending all values to 00, there must be a place within the subspace of InI_{n} that is properly measured as not 00 when doing the Lebesgue integral on whatever is put into σ\sigma. Considering the concept of almost everywhere in measure theory, one can understand that merely having function values not equal to 00 is insufficient.

In simpler terms, σ\sigma plays the actual role of a function that can distinguish ‘yes, no’ by satisfying the above properties for all yRny \in \mathbb{R}^{n} and θR\theta \in \mathbb{R}. Although the word ‘discriminatory’ may have a somewhat negative connotation, it is simply understood as distinguishing in the fields of science and engineering.

Discriminatory functions appear in the Universal Approximation Theorem, which theoretically supports fields like artificial neural networks, and naturally, they are related to the recently popular deep learning. In formulas, xInx \in I_{n} can be seen as input data, yRny \in \mathbb{R}^{n} as weights, θR\theta \in \mathbb{R} as biases. Then, σ\sigma can be seen as none other than the activation function. Considering that the idea of the activation function itself is an imitation of the threshold, the ‘discrimination’ of whether the signal is transmitted or not precisely corresponds with the concept of the discriminatory function.

Theorem

If σ\sigma is a continuous discriminatory function, then S:={G(x)=k=1Nαkσ(ykTx+θk):ykRnαk,θkRNN} S := \left\{ G(x) = \sum_{k=1}^{N} \alpha_{k} \sigma \left( y_{k}^{T} x+ \theta_{k} \right) : y_{k} \in \mathbb{R}^{n} \land \alpha_{k} , \theta_{k} \in \mathbb{R} \land N \in \mathbb{N} \right\} is uniformly dense in C(In)C\left( I_{n} \right). In other words, for every fC(In)f \in C \left( I_{n} \right) and ε>0\varepsilon > 0, there exists a GSG \in S that satisfies the following. Gf<ε \left\| G - f \right\| < \varepsilon

Explanation of the Theorem

To put the above theorem in plain words, it means that any function ff we desire can be closely approximated by a finite linear combination of discrimination functions with appropriate weights and biases. In the context of machine learning, ff is the function we truly seek, i.e., the function that does the job we want. It could be a function that takes a picture as input and judges whether it’s a dog or a cat, or turns a Korean string into English, or even a more complex and intriguing function.

The problem is, as the definition of the discriminatory function makes clear, it’s hard to know what a discriminatory function actually looks like in practice. Without reading the explanations provided in this post, or even after reading them, without knowledge of measure theory, it might seem like grasping at straws. Fortunately, some well-known functions can be proven to be discriminatory, thereby mathematically guaranteeing that artificial neural networks function.

Proof 1

Corollary 2 of the Hahn-Banach Theorem: Let XX be a C\mathbb{C}-vector space, and both YSY \subset S are subspaces of XX. If sSs \in S equals d(s,Y)=δ>0d (s, Y) = \delta > 0, then there exists a xXx^{ \ast } \in X^{ \ast } that satisfies the following: x1 \left\| x^{ \ast } \right\| \le 1 y(s)=y(s)=δ,s(SY)x(y)=y(y)=0,yY \begin{align*} y^{ \ast } (s) =& y^{ \ast } (s) = \delta \qquad , s \in (S \setminus Y) \\ x^{ \ast } (y) =& y^{ \ast } (y) = 0 \qquad , y \in Y \end{align*}

It must be shown that S=C(In)\overline{S} = C \left( I_{n} \right). Assuming SC(In)\overline{S} \subsetneq C \left( I_{n} \right), since the closure S\overline{S} is not the whole space, by the Hahn-Banach Theorem, there exists a L0L \ne 0 bounded linear functional L(C(In))L \in \left( C \left( I_{n} \right) \right)^{ \ast } such that: L(C(In))=L(S)={0} L \left( C \left( I_{n} \right) \right) = L \left( \overline{S} \right) = \left\{ 0 \right\} According to the Riesz Representation Theorem, LL can be represented as: L(h)=Inh(x)dμ(x) L (h) = \int_{I_{n}} h(x) d \mu (x) for some μM(In)\mu \in M \left( I_{n} \right) and all hC(In)h \in C \left( I_{n} \right). Especially, among them h(x)=σ(yTx+θ)h(x) = \sigma \left( y^{T} x + \theta \right) belongs to S\overline{S} for all y,θy, \theta, so by the property of LL: L(h)=Inσ(yTx+θ)dμ(x)=0 L (h) = \int_{I_{n}} \sigma \left( y^{T} x + \theta \right) d \mu (x) = 0 Since the premise is that σ\sigma is discriminatory, μ=0\mu = 0, then by the Riesz representation of LL, L=0L = 0 must be true, resulting in a contradiction with 0L0 \ne L. Therefore, S=C(In)\overline{S} = C\left( I_{n} \right) must be true.


  1. G. Cybenko. (1989). Approximation by Superpositions of a Sigmoidal Function p4. ↩︎