Explain the Difference Between TP, FP, TN, FN, with Examples
In classification tasks, the confusion matrix provides a summary of the model's performance by categorizing predictions into four components: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). These components are used to calculate evaluation metrics like precision, recall, and accuracy.
They come up in some form in almost all machine learning interviews, so it's important you have them down!
-
True Positives (TP): cases where the model correctly predicts the positive class. For example, in a spam detection model, an email that is spam and is correctly classified as spam is a true positive.
-
False Positives (FP): cases where the model incorrectly predicts the positive class. For example, in the spam detection model, an email that is not spam but is classified as spam is a false positive (also called a "Type I error").
-
True Negatives (TN): cases where the model correctly predicts the negative class. For example, an email that is not spam and is correctly classified as not spam is a true negative.
-
False Negatives (FN): cases where the model incorrectly predicts the negative class. For example, an email that is spam but is classified as not spam is a false negative (also called a "Type II error").
Confusion Matrix Layout
For a binary classification task, the confusion matrix is structured below. As you can see, it is made up of the components we just mentionend above!:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | True Positives (TP) | False Negatives (FN) |
| Actual Negative | False Positives (FP) | True Negatives (TN) |
For example, imagine a dataset with 100 emails where 40 emails are spam (positive class), and 60 emails are not spam (negative class).
If a model produces the following results:
- 35 spam emails are correctly identified (TP = 35).
- 5 spam emails are misclassified as not spam (FN = 5).
- 50 non-spam emails are correctly identified (TN = 50).
- 10 non-spam emails are misclassified as spam (FP = 10).
The confusion matrix would look like this:
| Predicted Spam | Predicted Not Spam | |
|---|---|---|
| Actual Spam | 35 (TP) | 5 (FN) |
| Actual Not Spam | 10 (FP) | 50 (TN) |
Related Metrics
A number of key metrics are constructed using these components. It's expected you're familiar with all of them:
-
Accuracy:
Measures the overall correctness of the model.
-
Precision:
Indicates the proportion of positive predictions that are actually correct.
-
Recall (Sensitivity):
Measures the ability to identify all positive cases.
-
F1 Score:
Balances precision and recall into a single metric.
-
Specificity:
Measures the ability to identify all negative cases.
When to Focus on Specific Metrics
- Precision is crucial when the cost of false positives is high (e.g., identifying fraud).
- Recall is important when the cost of false negatives is high (e.g., diagnosing diseases).
- F1 Score is useful for imbalanced datasets where a balance between precision and recall is needed.