Module 2: Introduction to ML fundamentals

Introduction to Machine Learning Fundamentals

Slides

View slides in full screen

Outline

  • Key ML terminology: model, training, features, target
  • Decision trees as an intuitive starting point
  • Train/Validation/Test splits
  • Generalization, overfitting/underfitting
  • The golden rule: never train on your test data
  • Common data challenges:
    • Missing data
    • Mixed data types (numeric, categorical, binary, text)
    • Different feature scales
    • Outliers
    • Small datasets
    • Proxy variables
  • Overview of a typical ML pipeline