I’m planning to hold an in-person midterm review office hour. Which time works best for you?
Scenario | Data Imbalance | Main Concern | Best Metric(s) / Curve |
---|---|---|---|
Email Spam Detection | 10% spam | Avoid false positives | |
Disease Screening | 1 in 10,000 | Avoid false negatives | |
Credit Card Fraud | 0.1% fraud | Focus on rare positive class | |
Customer Churn | 20% churn | Balance FP & FN | |
Sentiment Analysis | 50/50 balanced | Overall correctness | |
Face Recognition | Balanced pairs | Trade-off FP vs FN |
Metric / Plot | When to Use | Why |
---|---|---|
Precision, Recall, F1 | When you care about specific error types (FP vs FN) or a fixed threshold. | Focus on particular tradeoffs. |
PR Curve & AP Score | When the dataset is highly imbalanced (rare positives). | Ignores TNs; focuses on positives. |
ROC Curve & AUC | When classes are moderately imbalanced. | Measures ranking ability across thresholds. |
AUC–ROC measures the probability that a randomly chosen positive example receives a higher score than a randomly chosen negative example.
class weight="balanced"
(preferred method for this course)alpha
hyperparameter controls model complexity.alpha
.alpha
hyperparameteralpha
:
alpha
: Simpler model, smaller coefficients.alpha
: Complex model, larger coefficients.TransformedTargetRegressor
Select all of the following statements which are TRUE.
X
.alpha
hyperparameter of Ridge
has similar interpretation of C
hyperparameter of LogisticRegression
; higher alpha
means more complex model.Ridge
, smaller alpha means bigger coefficients whereas bigger alpha means smaller coefficients.Select all of the following statements which are TRUE.
sklearn
for regression problems, using r2_score()
and .score()
(with default values) will produce the same results.GridSearchCV
or RandomizedSearchCV
for regression as well as classification problems.Scenario | What matters most? | Best metric(s)? |
---|---|---|
Predicting house prices ranging from $60K–$800K. | A $30K error is huge for a $60K house but small for a $500K house. | |
Predicting exam scores (0–100). | You want an interpretable measure of average error in points. | |
Predicting energy consumption in a large industrial system. | Large errors are very costly and should be penalized heavily. | |
Predicting insurance claim amounts. | You want to compare how well different models explain the variation in claims. |