Select all of the following statements which are TRUE.
gamma
tend to result in higher training score but probably lower validation score.gamma
and C
, we can’t be certain if the model becomes more complex or less complex.Model | Parameters and hyperparameters | Strengths | Weaknesses |
---|---|---|---|
Decision Trees | |||
KNNs | |||
SVM RBF |
You’re trying to find a suitable date based on:
Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
---|---|---|---|---|
A | 25 | 400 | \(\sqrt{5^2 + 150^2}\) | 150.08 |
B | 27 | 300 | \(\sqrt{3^2 + 50^2}\) | 50.09 |
C | 30 | 500 | \(\sqrt{0^2 + 250^2}\) | 250.00 |
D | 60 | 250 | \(\sqrt{30^2 + 0^2}\) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
Select all of the following statements which are TRUE.
StandardScaler
ensures a fixed range (i.e., minimum and maximum values) for the features.StandardScaler
calculates mean and standard deviation for each feature separately.SimpleImputer
The transformed data has a different shape than the original data.Select all of the following statements which are TRUE.
scikit-learn
pipeline object with an estimator as the last step, you can call fit
, predict
, and score
on it.scikit-learn
pipeline.Fill in missing data using a chosen strategy:
Imputation is like filling in your average or median or most frequent grade for an assessment you missed.
Ensure all features have a comparable range.
Scaling is like adjusting the number of everyone’s Facebook friends so that both the number of friends and their age are on a comparable scale. This way, one feature doesn’t dominate the other when making comparisons.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
Fruit | 🍎 | 🍌 | 🍊 |
---|---|---|---|
Apple 🍎 | 1 | 0 | 0 |
Banana 🍌 | 0 | 1 | 0 |
Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
Rating | Ordinal |
---|---|
Poor | 1 |
Average | 2 |
Good | 3 |
sklearn
Transformers vs Estimatorsfit
and transform
methods.
fit(X)
: Learns parameters from the data.transform(X)
: Applies the learned transformation to the data.SimpleImputer
): Fills missing values.StandardScaler
): Standardizes features.fit
and predict
methods.
fit(X, y)
: Learns from labeled data.predict(X)
: Makes predictions on new data.DecisionTreeClassifier
, SVC
, KNeighborsClassifier
sklearn
PipelinesChaining a StandardScaler
with a KNeighborsClassifier
model.
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)