logo

Autoencoder 📂Machine Learning

Autoencoder

Definition

For two natural numbers mnm \ll n, the function f:RnRmf : \mathbb{R}^{n} \to \mathbb{R}^{m} is called an encoder. The function g:RmRng : \mathbb{R}^{m} \to \mathbb{R}^{n} is called a decoder. If h=gfh = g \circ f satisfies h(x)=xh(x) = x, it is called an autoencoder.

Explanation

Since the encoder’s output dimension is smaller than the input dimension, it can be considered as performing data compression and encryption. On the other hand, the decoder serves to restore the compressed/encrypted data. If a h=gfh = g \circ f satisfying x=h(x)x = h(x) can be found, then ff successfully compresses xx into a smaller dimension, and gg successfully restores the compressed data back to its original dimension. When autoencoders were first introduced, they were mainly used from the perspective of data compression, similar to Dimensionality Reduction, but recently, they have been widely utilized in generative models as well. 1

For an autoencoder to be useful from a data compression perspective, mm needs to be sufficiently smaller than nn. In such cases, it is impossible to find a hh that precisely satisfies x=h(x)x = h(x), and the goal becomes finding a hh that satisfies xh(x)x \approx h(x) as closely as possible. (Of course, even mathematically m=n1m = n - 1 is not sufficient to find a hh that satisfies x=h(x)x = h(x))

U-net is a structure that adds skip connections to an autoencoder for images.


  1. Ian Goodfellow, Deep Learning, p557-558 ↩︎