Latent Variable and Latent Space 📂Machine Learning

Latent Variable and Latent Space

Definition

Suppose a dataset $X \subset \mathbb{R}^{n}$ is given. A function with the dataset as its domain is called an encoder.

$\begin{align*} f : X &\to Z \\ \mathbf{x} &\mapsto \mathbf{z} = f(\mathbf{x}) \end{align*}$

The range of the encoder $Z \subset \mathbb{R}^{m}$ ( $m \le n$ ) is called the latent space, and the elements of the latent space $\mathbf{z}$ are referred to as latent variables or feature vectors.

For a function $g : Z \to Y$ ,

If $Y$ is the label space of the data $X$ , $g$ is referred to as a classifier.
If $Y = X$ , $g$ is called a decoder.

Explanation

One of the roles of an encoder is to compress data, so it is common for the dimension of the latent space to be smaller than that of the dataset. In the context of information theory, it must be smaller, but in machine learning and deep learning, as long as the performance is satisfactory, it does not matter if the dimensionality of the latent variables is the same as or larger than the data. Of course, in most cases, it is better if it is smaller or equal.

Unlike in the context of information theory or cryptography, in machine learning, the encoder is expected to play another crucial role: extracting features from the data. Thus, the process of inputting data into a neural network and obtaining an output is called feature extraction or embedding. The reason deep learning assigns meaning to the function values of encoders and emphasizes feature extraction is that artificial neural networks are black-box algorithms. Unlike traditional methods where the principles or structures of the encoder are explicit, an encoder created by an artificial neural network operates in a way that is not precisely understood. Therefore, the output of an encoder is given the meaning of being latent features extracted from the data.

This meaning is ultimately assigned by humans, so mathematically, whether in the context of information theory or machine learning, the role of the encoder remains the same. If the exclusive purpose is compression, the nature of the compressed data, meaning the form of the encoder’s output, is irrelevant. However, in machine learning, the output of the encoder must preserve the features of the data well, and thus the output should be in a form that accurately represents the features of the data. For instance, when encoding image data, the output of the encoder should accurately represent the color, shape, and position of the image. A good example would be if the feature vectors for dumbbell, cup, and Dalmatian pictures looked like the following images. (With some imagination, you might faintly see a dumbbell, a cup, and a Dalmatian) ¹

Alternatively, it is also acceptable if the location and contours of the objects of interest or importance are prominent.

Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013). ↩︎