I’m planning to hold an in-person midterm review office hour. Which time works best for you?
| Scenario | Data Imbalance | Main Concern | Best Metric(s) / Curve |
|---|---|---|---|
| Email Spam Detection | 10% spam | Avoid false positives | |
| Disease Screening | 1 in 10,000 | Avoid false negatives | |
| Credit Card Fraud | 0.1% fraud | Focus on rare positive class | |
| Customer Churn | 20% churn | Balance FP & FN | |
| Sentiment Analysis | 50/50 balanced | Overall correctness | |
| Face Recognition | Balanced pairs | Trade-off FP vs FN |
| Metric / Plot | When to Use | Why |
|---|---|---|
| Precision, Recall, F1 | When you care about specific error types (FP vs FN) or a fixed threshold. | Focus on particular tradeoffs. |
| PR Curve & AP Score | When the dataset is highly imbalanced (rare positives). | Ignores TNs; focuses on positives. |
| ROC Curve & AUC | When classes are moderately imbalanced. | Measures ranking ability across thresholds. |
AUC–ROC measures the probability that a randomly chosen positive example receives a higher score than a randomly chosen negative example.
class weight="balanced" (preferred method for this course)alpha hyperparameter controls model complexity.alpha.alpha hyperparameteralpha:
alpha: Simpler model, smaller coefficients.alpha: Complex model, larger coefficients.TransformedTargetRegressor
Select all of the following statements which are TRUE.
X.alpha hyperparameter of Ridge has similar interpretation of C hyperparameter of LogisticRegression; higher alpha means more complex model.Ridge, smaller alpha means bigger coefficients whereas bigger alpha means smaller coefficients.Select all of the following statements which are TRUE.
sklearn for regression problems, using r2_score() and .score() (with default values) will produce the same results.GridSearchCV or RandomizedSearchCV for regression as well as classification problems.| Scenario | What matters most? | Best metric(s)? |
|---|---|---|
| Predicting house prices ranging from $60K–$800K. | A $30K error is huge for a $60K house but small for a $500K house. | |
| Predicting exam scores (0–100). | You want an interpretable measure of average error in points. | |
| Predicting energy consumption in a large industrial system. | Large errors are very costly and should be penalized heavily. | |
| Predicting insurance claim amounts. | You want to compare how well different models explain the variation in claims. |