By the end of this lesson, you will be able to:
Define key machine learning terminology:
features, targets, predictions, training, error, classification vs. regression, supervised vs. unsupervised learning, hyperparameters vs. parameters, baselines, decision boundaries
Build a simple machine learning model in scikit-learn, explaining the fit
–predict
workflow and evaluating performance with the score
method
Describe at a high level how decision trees are trained (fitting) and how they make predictions
Implement and visualize decision trees in scikit-learn using DecisionTreeClassifier
and DecisionTreeRegressor
iClicker join link: https://join.iclicker.com/FZMQ
Select all of the following statements which are suitable problems for machine learning.
In the first part of this course, we’ll focus on supervised machine learning.
Clicker cloud join link:
Select all of the following statements which are examples of supervised machine learning
Clicker cloud join link:
Select all of the following statements which are examples of regression problems
scikit-learn
framework.Imagine you’re in the fortunate situation where, after graduating, you have a few job offers and need to decide which one to choose. You want to pick the job that will likely make you the happiest. To help with your decision, you collect data from like-minded people.
Here are the first few rows of a toy dataset.
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | |
---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy |
1 | 1 | 60000 | 0 | 0 | Unhappy |
2 | 1 | 80000 | 1 | 0 | Happy |
3 | 1 | 110000 | 0 | 1 | Happy |
4 | 1 | 120000 | 1 | 0 | Happy |
5 | 1 | 150000 | 1 | 1 | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy |
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | |
---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy |
1 | 1 | 60000 | 0 | 0 | Unhappy |
2 | 1 | 80000 | 1 | 0 | Happy |
3 | 1 | 110000 | 0 | 1 | Happy |
4 | 1 | 120000 | 1 | 0 | Happy |
5 | 1 | 150000 | 1 | 1 | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy |
Of course these goals are related, and in many situations we need both.
sklearn
) metric is accuracy:\[ \text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total number of examples}} \]
from sklearn.dummy import DummyClassifier
model = DummyClassifier(strategy="most_frequent") # Initialize the DummyClassifier to always predict the most frequent class
model.fit(X, y) # Train the model on the feature set X and target variable y
toy_happiness_df['dummy_predictions'] = model.predict(X) # Add the predicted values as a new column in the dataframe
toy_happiness_df
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | dummy_predictions | |
---|---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy | Happy |
1 | 1 | 60000 | 0 | 0 | Unhappy | Happy |
2 | 1 | 80000 | 1 | 0 | Happy | Happy |
3 | 1 | 110000 | 0 | 1 | Happy | Happy |
4 | 1 | 120000 | 1 | 0 | Happy | Happy |
5 | 1 | 150000 | 1 | 1 | Happy | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy | Happy |
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | |
---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy |
1 | 1 | 60000 | 0 | 0 | Unhappy |
2 | 1 | 80000 | 1 | 0 | Happy |
3 | 1 | 110000 | 0 | 1 | Happy |
4 | 1 | 120000 | 1 | 0 | Happy |
5 | 1 | 150000 | 1 | 1 | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy |
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | |
---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy |
1 | 1 | 60000 | 0 | 0 | Unhappy |
2 | 1 | 80000 | 1 | 0 | Happy |
3 | 1 | 110000 | 0 | 1 | Happy |
4 | 1 | 120000 | 1 | 0 | Happy |
5 | 1 | 150000 | 1 | 1 | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy |
sklearn
Let’s train a simple decision tree on our toy dataset using sklearn
from sklearn.tree import DecisionTreeClassifier # import the classifier
from sklearn.tree import plot_tree
model = DecisionTreeClassifier(max_depth=2, random_state=1) # Create a class object
model.fit(X, y)
plot_tree(model, filled=True, feature_names = X.columns, class_names=["Happy", "Unhappy"], impurity = False, fontsize=12);
sklearn
test_example = [[1, 60000, 0, 1]]
print("Model prediction: ", model.predict(test_example))
plot_tree(model, filled=True, feature_names = X.columns, class_names = ["Happy", "Unhappy"], impurity = False, fontsize=9);
Model prediction: ['Unhappy']
max_depth
, which limits how deep the tree can go.max_depth=1
max_depth=2
iClicker cloud join link: https://join.iclicker.com/FZMQ
Select all of the following statements which are TRUE.
sklearn
basic steps