By the end of this lesson, you will be able to:
Select all of the following statements which are suitable problems for machine learning.
In the first part of this course, we’ll focus on supervised machine learning.
Clicker cloud join link:
Select all of the following statements which are examples of supervised machine learning
Clicker cloud join link:
Select all of the following statements which are examples of regression problems
scikit-learn
framework.Imagine you’re in the fortunate situation where, after graduating, you have a few job offers and need to decide which one to choose. You want to pick the job that will likely make you the happiest. To help with your decision, you collect data from like-minded people.
Here are the first few rows of a toy dataset.
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | |
---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy |
1 | 1 | 60000 | 0 | 0 | Unhappy |
2 | 1 | 80000 | 1 | 0 | Happy |
3 | 1 | 110000 | 0 | 1 | Happy |
4 | 1 | 120000 | 1 | 0 | Happy |
5 | 1 | 150000 | 1 | 1 | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy |
Of course these goals are related, and in many situations we need both.
from sklearn.dummy import DummyClassifier
model = DummyClassifier(strategy="most_frequent") # Initialize the DummyClassifier to always predict the most frequent class
model.fit(X, y) # Train the model on the feature set X and target variable y
toy_happiness_df['dummy_predictions'] = model.predict(X) # Add the predicted values as a new column in the dataframe
toy_happiness_df
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | dummy_predictions | |
---|---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy | Happy |
1 | 1 | 60000 | 0 | 0 | Unhappy | Happy |
2 | 1 | 80000 | 1 | 0 | Happy | Happy |
3 | 1 | 110000 | 0 | 1 | Happy | Happy |
4 | 1 | 120000 | 1 | 0 | Happy | Happy |
5 | 1 | 150000 | 1 | 1 | Happy | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy | Happy |
supportive_colleagues | salary | free_coffee | boss_vegan | happy? | |
---|---|---|---|---|---|
0 | 0 | 70000 | 0 | 1 | Unhappy |
1 | 1 | 60000 | 0 | 0 | Unhappy |
2 | 1 | 80000 | 1 | 0 | Happy |
3 | 1 | 110000 | 0 | 1 | Happy |
4 | 1 | 120000 | 1 | 0 | Happy |
5 | 1 | 150000 | 1 | 1 | Happy |
6 | 0 | 150000 | 1 | 0 | Unhappy |
sklearn
Let’s train a simple decision tree on our toy dataset using sklearn
from sklearn.tree import DecisionTreeClassifier # import the classifier
from sklearn.tree import plot_tree
model = DecisionTreeClassifier(max_depth=2, random_state=1) # Create a class object
model.fit(X, y)
plot_tree(model, filled=True, feature_names = X.columns, class_names=["Happy", "Unhappy"], impurity = False, fontsize=12);
sklearn
test_example = [[1, 60000, 0, 1]]
print("Model prediction: ", model.predict(test_example))
plot_tree(model, filled=True, feature_names = X.columns, class_names = ["Happy", "Unhappy"], impurity = False, fontsize=9);
Model prediction: ['Unhappy']
max_depth
, which limits how deep the tree can go.max_depth=1
max_depth=2
iClicker cloud join link:
Select all of the following statements which are TRUE.
sklearn
basic steps