Module 2: Introduction to ML fundamentals
Introduction to Machine Learning Fundamentals
Slides
Outline
- Key ML terminology: model, training, features, target
- Decision trees as an intuitive starting point
- Train/Validation/Test splits
- Generalization, overfitting/underfitting
- The golden rule: never train on your test data
- Common data challenges:
- Missing data
- Mixed data types (numeric, categorical, binary, text)
- Different feature scales
- Outliers
- Small datasets
- Proxy variables
- Overview of a typical ML pipeline