logo

What is One-Hot Encoding in Machine Learning? 📂Machine Learning

What is One-Hot Encoding in Machine Learning?

Definition

Given a set XRnX \subset \mathbb{R}^{n}, suppose its subsets XiX_{i} satisfy the following.

X=X1XNandXiXj=(ij) X = X_{1} \cup \cdots \cup X_{N} \quad \text{and} \quad X_{i} \cap X_{j} = \varnothing \enspace (i \ne j)

Let’s call β={e1,,eN}\beta = \left\{ e_{1}, \dots, e_{N} \right\} the standard basis of RN\mathbb{R}^{N}. Then, the following function, or mapping xXx \in X itself, is called one-hot encoding.

f:Xβxei if xXi \begin{align*} f : X &\to \beta \\ x &\mapsto e_{i} \text{ if } x \in X_{i} \end{align*}

Explanation

It’s a commonly used method for labeling data in machine learning. Since there’s only one non-zero element, it’s called one-hot. This mapping is done to treat the data labels as qualitative variables rather than quantitative variables. Imagine assigning [1][1] as a label to a picture of clothes, and [2][2] to a picture of shoes. Even though there’s no meaning of being 22 times different between the two pictures, such meaning is represented in the labels. Moreover, if the predicted value is [5][5], it becomes ambiguous whether this should be considered closer to [1][1] or [2][2], or if it’s a failed prediction. Hence, by using labels like [10]\begin{bmatrix} 1 \\ 0 \end{bmatrix} and [01]\begin{bmatrix} 0 \\ 1 \end{bmatrix}, unintended meanings are prevented from being attributed, and values can be obtained only within the intended range. Therefore, N=βN = \left| \beta \right| represents the number of classes to classify the data.

For instance, one-hot encoding the MNIST data is as follows.

,,,,,e1=[1000]T \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FdgvulP%2FbtrRTtjz8Ah%2FIKWA7Ckzkjitj5X6vwd11k%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FchjNBz%2FbtrRW0nrB59%2FwUVzGwFGvVIA9iemnOmkN1%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbRAv7N%2FbtrRWLjvZku%2FCLGtZLlkuC7fKZlSZlr2u1%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2F9YZyq%2FbtrRSPtGAii%2F2N3tRn9bhQhLbs0l0OKxT0%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcJpInQ%2FbtrRWZaZ4Bo%2FwE0wwSOxZZ7wrwKqCFQbA1%2Fimg.jpg}, \raisebox{0.5em}{e1=[1000]T\enspace \cdots \enspace \mapsto e_{1} = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0\end{bmatrix}^{T}}

,,,,,e2=[0100]T \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FrV2Hd%2FbtrRXuocv2o%2FEP2Tt3R7Vft3dPucw5iJz1%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FVQWMs%2FbtrRXfLytuV%2FxvEuEznI71CnPBD0fNEHmk%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbbvAq2%2FbtrRTtYkr1S%2FA45KGWUNxA2IT2mqeBVqWK%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Ftf6ng%2FbtrRXvm3jcc%2FzQouozMFozW7Eiq3Dsqqe0%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FT2gLG%2FbtrRYyJ8alW%2FIqmmahDUmM1yXhAXmg2MWK%2Fimg.jpg}, \raisebox{0.5em}{e2=[0100]T\enspace \cdots \enspace \mapsto e_{2} = \begin{bmatrix} 0 & 1 & 0 & \cdots & 0\end{bmatrix}^{T}}

,,,,,e3=[0010]T \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FppAxy%2FbtrRTtxgxbr%2F4cfRUjLAzD5TzsDopAkKt0%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FwwRei%2FbtrRVK6oKTc%2FISAO9LE6Qc4j5KglwxV0K0%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FboTCrt%2FbtrRX2EGFT7%2F4SkN8ZDSHTS57Nf2CpIiz1%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxxZzL%2FbtrRVLjYEDk%2F5eQyGDM6bNjq4KNrmPltb1%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FxxZzL%2FbtrRVLjYEDk%2F5eQyGDM6bNjq4KNrmPltb1%2Fimg.jpg}, \raisebox{0.5em}{e3=[0010]T\enspace \cdots \enspace \mapsto e_{3} = \begin{bmatrix} 0 & 0 & 1 & \cdots & 0\end{bmatrix}^{T}}

\vdots

,,,,,e10=[0001]T \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbRSmNg%2FbtrRTtxgz8s%2FjpZ5TGHy9d6JKjTob92PA0%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbcVpNY%2FbtrRXDE9s9S%2Fka5hNQVMgXgn8kyPD5ZBG0%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fc7gcV8%2FbtrRX1lvvbZ%2FeSuCvSRoHs3scKOvfer3n1%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FNWuc9%2FbtrRX1MyDYL%2F4c0G8AJknZoDGe9zdwuBVk%2Fimg.jpg}, \includegraphics[height=2em]{https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FmG3XY%2FbtrRXGhClhU%2FsDgIVjw4Kq4KWl5PPcXyyK%2Fimg.jpg}, \raisebox{0.5em}{e10=[0001]T\enspace \cdots \enspace \mapsto e_{10} = \begin{bmatrix} 0 & 0 & 0 & \cdots & 1\end{bmatrix}^{T}}

See Also