Practice: confusion matrix terminology
Confusion matrix questions
Imagine a spam filter model where emails labeled 1 = spam, 0 = not spam.
If a spam email is incorrectly classified as not spam, what kind of error is this?
- A false positive
- A true positive
- A false negative
- A true negative
Confusion matrix questions
In an intrusion detection system, 1 = intrusion, 0 = safe.
If the system misses an actual intrusion and classifies it as safe, this is a:
- A false positive
- A true positive
- A false negative
- A true negative
Confusion matrix questions
In a medical test for a disease, 1 = diseased, 0 = healthy.
If a healthy patient is incorrectly diagnosed as diseased, that’s a:
- A false positive
- A true positive
- A false negative
- A true negative
Metrics other than accuracy
Now that we understand the different types of errors, we can explore metrics that better capture model performance when accuracy falls short, especially for imbalanced datasets.
We’ll start with three key ones:
- Precision
- Recall
- F1-score
Precision and recall
Let’s revisit our fraud detection scenario. The circle below represents all transactions predicted as fraud by an imaginary toy model designed to detect fraudulent activity.
![]()
Intuition behind the two metrics
- Precision: Of all the transactions predicted as fraud, how many were actually fraud?
- High precision \(\rightarrow\) few false alarms (low false positives).
- Recall: Of all the actual fraud cases, how many did the model catch?
- High recall \(\rightarrow\) few missed frauds (low false negatives).
Trade-off between precision and recall
- Increasing recall often decreases precision, and vice versa.
- Example:
- Predict “fraud” for every transaction \(\rightarrow\) perfect recall, terrible precision.
- Predict “fraud” only when 100% sure \(\rightarrow\) high precision, low recall.
The right balance depends on the application and cost of errors.
F1-score
- Sometimes, we want a single metric that balances precision and recall.
- The F1-score is the harmonic mean of the two:
\[
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
\]
- High F1 means both precision and recall are strong.
- Useful when we care about both false positives and false negatives.
Summary
Accuracy |
Overall correctness |
Model gets most predictions right |
Precision |
Quality of positive predictions |
Few false alarms |
Recall |
Quantity of true positives caught |
Few missed positives |
F1-score |
Balance of precision & recall |
Both precision and recall are high |
iClicker Exercise 9.1
Select all of the following statements which are TRUE.
- In medical diagnosis, false positives are more damaging than false negatives (assume “positive” means the person has a disease, “negative” means they don’t).
- In spam classification, false positives are more damaging than false negatives (assume “positive” means the email is spam, “negative” means they it’s not).
- If method A gets a higher accuracy than method B, that means its precision is also higher.
- If method A gets a higher accuracy than method B, that means its recall is also higher.
Counter examples
Method A - higher accuracy but lower precision
Method B - lower accuracy but higher precision
Takeaway
- Accuracy summarizes overall correctness but hides class-specific behaviour.
- You can have high accuracy but poor precision or recall,
especially in imbalanced datasets.
- Always check multiple metrics before deciding which model is better.