Autoencoder

Definition

For two natural numbers $m \ll n$, the function $f : \mathbb{R}^{n} \to \mathbb{R}^{m}$ is called an encoder. The function $g : \mathbb{R}^{m} \to \mathbb{R}^{n}$ is called a decoder. If $h = g \circ f$ satisfies $h(x) = x$, it is called an autoencoder.

Explanation

Since the encoder’s output dimension is smaller than the input dimension, it can be considered as performing data compression and encryption. On the other hand, the decoder serves to restore the compressed/encrypted data. If a $h = g \circ f$ satisfying $x = h(x)$ can be found, then $f$ successfully compresses $x$ into a smaller dimension, and $g$ successfully restores the compressed data back to its original dimension. When autoencoders were first introduced, they were mainly used from the perspective of data compression, similar to Dimensionality Reduction, but recently, they have been widely utilized in generative models as well. ¹

For an autoencoder to be useful from a data compression perspective, $m$ needs to be sufficiently smaller than $n$. In such cases, it is impossible to find a $h$ that precisely satisfies $x = h(x)$, and the goal becomes finding a $h$ that satisfies $x \approx h(x)$ as closely as possible. (Of course, even mathematically $m = n - 1$ is not sufficient to find a $h$ that satisfies $x = h(x)$)

U-net is a structure that adds skip connections to an autoencoder for images.

Ian Goodfellow, Deep Learning, p557-558 ↩︎