Confusion Matrix, Sensitivity, and Specificity 📂Data Science

Confusion Matrix, Sensitivity, and Specificity

Definition

Let’s assume we have a model for distinguishing between positive $P$ and negative $N$ in a classification problem. We define the number of positives correctly predicted as true positives $TP$ , the number of negatives correctly predicted as true negatives $TN$ , the number of positives incorrectly predicted as negatives as false negatives $FN$ , and the number of negatives incorrectly predicted as positives as false positives $FP$ .

Confusion Matrix

$20180521\_131719.png$

In classification problems, the Confusion Matrix is used as a metric to evaluate the model as shown above.

Accuracy

$\text{Accuracy} = {{TP + TN} \over { P + N }}$ In the table above, P represents positive, and N represents negative. TP is the case predicted positive and actually positive, TN is the case predicted negative and actually negative. It is common sense and reasonable to assess models with relatively high TP and TN as good models. On the other hand, where there is ‘correct’, there must also be ‘incorrect’. The Error Rate is defined as follows. $\text{Error Rate} = 1 - \text{Accuracy} = {{FP + FN} \over {N + P}}$

Precision

$\text{Precision} = {{TP } \over {TP + FP}}$ Precision is the ratio of those actually true among those predicted as true. Be careful not to confuse it with Accuracy.

Sensitivity

$\text{Sensitivity} = \text{True Positive Rate} = \text{Recall} = {{TP } \over { P }}$ Sensitivity is the ratio of those predicted as true among the positives, also known as Recall or True Positive Rate.

Specificity

$\text{Specificity} = 1 - \text{False Positive Rate} = 1- {{FP } \over { N }} = {{TN } \over {N }}$ Specificity is the ratio of those predicted as false among the negatives.

Here, a graph with False Positive Rate and True Positive Rate as axes is referred to as the ROC curve.